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ABSTRACT 

A technique is provided for creating metadata for fast search 
of XML documents stored as column data. Data is stored in 
a data store connected to a computer. Amain table is created 



having a column for storing a document, wherein the docu- 
ment has one or more elements or attributes. One or more 
side tables are created, wherein each side table stores one or 
more elements or attributes. Then, the side tables are used to 
locate data in the main table with scalable indexing mecha- 
nisms to facilitate search. 

A technique is provided for generating one or more XML 
documents from a single SQL query. Data stored on a data 
storage device that is connected to a computer is trans- 
formed. A query that selects data in the data storage device 
is received. The selected data is retrieved into a work space. 
Then, one or more XML documents are generated to consist 
of the selected data. 

A technique is provided for generating one or more XML 
documents from a relational database using the XPath data 
model. Data stored on a data storage device that is connected 
to a computer is transformed. Initially, a document object 
model tree is generated using a document access definition, 
which defines the mapping between an XML tree structure 
and relational tables. The document object model tree is 
traversed to obtain information to retrieve relational data. 
The relational data is mapped to one or more XML docu- 
ments. 

A technique is provided to store fragmented XML data into 
a relational database by decomposing XML documents with 
application specific mappings. Data stored on a data store 
that is connected to a computer is transformed. Initially, an 
XML document containing XML data is received. A docu- 
ment access definition that identifies one or more relational 
tables and columns is received. The XML data is mapped 
from the application DTD to the relational tables and 
columns using the document access definition based on the 
XPath data model. 
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XML DOCUMENT PROCESSING 

PROVISIONAL APPLICATION 

[0001] This application claims the benefit of U.S. Provi- 
sional Application No. 60/168,659, entitled "XML DOCU- 
MENT PROCESSING," filed on Dec. 2, 1999, by Isaac 
Cheng, et al., attorney's reference number ST9-99-106, 
which is incorporated by reference herein. 

FIELD OF THE INVENTION 

[0002] This invention relates in general to computer- 
implemented database systems, and, in particular, to pro- 
cessing Extensible Markup Language (XML) documents. 

BACKGROUND OF THE INVENTION 

[0003] The Internet is a collection of computer networks 
that exchange information via Hyper Text Transfer Protocol 
(HTTP). The Internet computer network consists of many 
internet networks. Currently, the use of the Internet com- 
puter network for commercial and non-commercial uses is 
exploding. Via its networks, the Internet computer network 
enables many users in different locations to access informa- 
tion stored in data sources (e.g., databases) stored in differ- 
ent locations. 

[0004] The World Wide Web (Le., the "WWW" or the 
"Web") is a hypertext information and communication sys- 
tem used on the Internet computer network with data com- 
munications operating according to a client/server model. 
Typically, a Web client computer will request data stored in 
data sources from a Web server computer, at which Web 
server software resides. The Web server software interacts 
with an interface connected to, for example, a Database 
Management System ("DBMS"), which is connected to the 
data sources. These computer programs residing at the Web 
server computer will retrieve the data and transmit the data 
to the client computer. The data can be any type of infor- 
mation, including database data, static data, HTML data, or 
dynamically generated data. 

[0005] With the fast growing popularity of the Internet and 
the World Wide Web (also known as "WWW" or the 
"Web"), there is also a fast growing demand for Web access 
to databases. 

[0006] Databases are computerized information storage 
and retrieval systems. A Relational Database Management 
System (RDBMS) is a database management system 
(DBMS) which uses relational techniques for storing and 
retrieving data. Relational databases are organized into 
physical tables which consist of rows and columns of data. 
The rows are formally called tuples. A database will typi- 
cally have many physical tables and each physical table will 
typically have multiple tuples and multiple columns. The 
physical tables are typically stored on random access storage 
devices (RASED) such as magnetic or optical disk drives for 
semi-permanent storage. Additionally, logical tables or 
"views" can be generated based on the physical tables and 
provide a particular way of looking at the database. A view 
arranges rows in some order, without affecting the physical 
organization of the database. 

[0007] RDBMS software using a Structured Query Lan- 
guage (SQL) interface is well known in the art. The SQL 
interface has evolved into a standard language for RDBMS 



software and has been adopted as such by both the American 
National Standards Institute (ANSI) and the International 
Standards Organization (ISO). 

[0008] The SQL interface allows users to formulate rela- 
tional operations on the tables either interactively, in batch 
files, or embedded in host languages, such as C and COBOL. 
SQL allows the user to manipulate the data. The definitions 
for SQL provide that a RDBMS should respond to a par- 
ticular query with a particular set of data given a specified 
database content, but the technique that the RDBMS uses to 
actually find the required information in the tables on the 
disk drives is left up to the RDBMS. Typically, there will be 
more than one technique that can be used by the RDBMS to 
access the required data. The RDBMS will optimize the 
technique used to find the data requested in a query in order 
to minimize the computer time used and, therefore, the cost 
of performing the query. 

[0009] Additionally, an index is an ordered set of refer- 
ences to the records or rows in a database file or table. The 
index is used to access each record in the file using a key 
(i.e., one of the fields of the record or attributes of the row). 
When data is to be retrieved, an index is used to locate 
records. Then, the data is sorted into a user-specified order 
and returned to the user. 

[0010] Extensible Markup Language (XML) is a new 
specification that is quickly gaining popularity for creating 
what are termed "XML documents". XML documents com- 
prise structured data. XML documents are being shared 
between multiple businesses and between businesses and 
customers. 

[0011] When XML documents are stored as column data, 
searching for desired XML data can be time-consuming. 
Typically, a search for XML data would require searching 
each XML document. This is usually called a document 
scan. Thus, there is a need in the art for an improved 
technique for searching for XML documents stored as 
column data. 

[0012] With the longstanding use of relational databases, 
many businesses have stored their data in relational tables. 
In order to share this data with businesses that are using 
XML documents, the data in the relational databases may be 
manually selected, retrieved, and stored into XML docu- 
ments. This is a long, tedious task. Thus, there is a need for 
an improved technique of selecting, retrieving, and storing 
relational data into XML documents. 
[0013] In order to share relational data with other busi- 
nesses that are using XML documents, a user may manually 
convert the relational data into XML documents. This is time 
consuming and inefficient. Thus, there is a need for an 
improved technique of generating XML documents from 
relational data. 

[0014] Additionally, when an XML document is received, 
a user may need to store the data from the XML document 
into a relational database. Currently, this is a time consuming 
processing in which a user manually transfers the data from 
the XML document to the relational database. Thus, there is 
a need for an improved technique of decomposing an XML 
document and storing the decomposed data into a relational 
database. 

SUMMARY OF THE INVENTION 
[0015] To overcome the limitations in the prior art 
described above, and to overcome other limitations that will 
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become apparent upon reading and understanding the 
present specification, the present invention discloses a 
method, apparatus, and article of manufacture for a com- 
puter implemented technique for processing XML docu- 
ments. 

[0016] In accordance with one aspect of the present inven- 
tion, data is stored in a data store connected to a computer. 
A main table is created having a column for storing a 
document, wherein the document has one or more elements 
or attributes. One or more side tables are created, wherein 
each side table stores one or more elements or attributes. 
Then, the side tables are used to locate data in the main table. 

[0017] In accordance with another aspect of the present 
invention, data stored on a data storage device that is 
connected to a computer is transformed. A query that selects 
data in the data storage device is received. The selected data 
is retrieved into a work space. Then, one or more XML 
documents are generated to consist of the selected data. 

[0018] In accordance with yet another aspect of the 
present invention, data stored on a data storage device that 
is connected to a computer is transformed. Initially, a 
document object model tree is generated using a document 
access definition. The document object model tree is tra- 
versed to obtain information to retrieve relational data. The 
relational data is mapped to one or more XML documents. 

[0019] In accordance with a further aspect of the present 
invention, data stored on a data store that is connected to a 
computer is transformed. Initially, an XML document con- 
taining XML data is received. A document access definition 
that identifies one or more relational tables and columns is 
received. The XML data is mapped from the application 
DTD to the relational tables and columns using the docu- 
ment access definition based on the XPath data model. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0020] Referring now to the drawings in which like ref- 
erence numbers represent corresponding parts throughout: 

[0021] FIG. 1 schematically illustrates the hardware envi- 
ronment of an embodiment of the present invention, and 
more particularly, illustrates a typical distributed computer 
system using a network. 

[0022] FIG. 2 is a diagram illustrating a computer soft- 
ware environment that could be used in accordance with the 
present invention. 

[0023] FIG. 3 illustrates an application or main table and 
its four side tables. 

[0024] FIG. 4 is a flow diagram illustrating steps per- 
formed by the XML System in creating and maintaining 
XML document data as column data. 

[0025] FIG. 5 is a flow diagram of steps performed by the 
XML System to enable a column. 

[0026] FIG. 6 is a flow diagram of steps performed by the 
XML System to disable a column. 

[0027] FIG. 7 is a diagram illustrating code organization 
to compose XML documents. 

[0028] FIG. 8 is a block diagram illustrating components 
of the XML System in one embodiment of the invention. 



[0029] FIG. 9 is a flow diagram illustrating the steps 
performed by the XML System to transform relational data 
into one or more XML documents using SQL mapping. 

[0030] FIG. 10 is a flow diagram illustrating the process 
performed by the XML system using RDBnode mapping to 
compose XML documents. 

[0031] FIG. 11 is a flow diagram illustrating the steps 
performed by the XML System to decompose XML docu- 
ments with application specific mappings. 

DETAILED DESCRIPTION 

[0032] In the following description of an embodiment of 
the invention, reference is made to the accompanying draw- 
ings which form a part hereof, and which is shown by way 
of illustration a specific embodiment in which the invention 
may be practiced. It is to be understood that other embodi- 
ments may be utilized as structural changes may be made 
without departing from the scope of the present invention. 

A Hardware Architecture 

[0033] FIG. 1 schematically illustrates the hardware envi- 
ronment of an embodiment of the present invention, and 
more particularly, illustrates a typical distributed computer 
system using a network 100 to connect client computers 102 
executing client applications to a server computer 104 
executing software and other computer programs, and to 
connect the server system 104 to data sources 106. A typical 
combination of resources may include client computers 102 
that are personal computers or workstations, and a server 
computer 104 that is a personal computer, workstation, 
minicomputer, or mainframe. These systems are coupled to 
one another by various networks, including LANs, WANs, 
SNA networks, and the Internet Each client computer 102 
and the server computer 104 additionally comprise an oper- 
ating system and one or more computer programs. 

[0034] A client computer 102 typically executes a client 
application and is coupled to a server computer 104 execut- 
ing one or more server software. The server software may 
include an XML system 110. The server computer 104 also 
uses a data store interface and, possibly, other computer 
programs, for connecting to the data sources 106. The client 
computer 102 is bi-directionally coupled with the server 
computer 104 over a line or via a wireless system. In turn, 
the server computer 104 is bi-directionally coupled with data 
sources 106. 

[0035] The data store interface may be connected to a 
Database Management System (DBMS), which supports 
access to a data store 106 by executing, for example, 
RDBMS software. The interface and DBMS may be located 
at the server computer 104 or may be located on one or more 
separate machines. The data sources 106 may be geographi- 
cally distributed. 

[0036] The operating system and computer programs are 
comprised of instructions which, when read and executed by 
the client and server computers 102 and 104, cause the client 
and server computers 102 and 104 to perform the steps 
necessary to implement and/or use the present invention. 
Generally, the operating system and computer programs are 
tangibly embodied in and/or readable from a device, carrier, 
or media, such as memory, other data storage devices, and/or 
data communications devices. Under control of the operat- 
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iog system, the computer programs may be loaded from 
memory, other data storage devices and/or data communi- 
cations devices into the memory of the computer for use 
during actual operations. 

[0037] Thus, the present invention may be implemented as 
a method, apparatus, or article of manufacture using stan- 
dard programming and/or engineering techniques to produce 
software, firmware, hardware, or any combination thereof. 
The term "article of manufacture** (or alternatively, "com- 
puter program product**) as used herein is intended to 
encompass a computer program accessible from any com- 
puter-readable device, carrier, or media. Of course, those 
skilled in the art will recognize many modifications may be 
made to this configuration without departing from the scope 
of the present invention. 

[0038] Those skilled in the art will recognize that the 
exemplary environment illustrated in FIG. 1 is not intended 
to limit the present invention. Indeed, those skilled in the art 
will recognize that other alternative hardware environments 
may be used without departing from the scope of the present 
invention. 

B. XML Background 
[0039] Extensible Markup Language (XML) is a subset of 
Standard Generalized Markup Language (SGML). XML 
works in conjunction with Extensible Stylesheet Language 
Transformation, (XSLT) and Extensible Markup Language 
Path (XPath). XML may also work in conjunction with a 
Document Object Model (DOM) or Namespace. 

[0040] Extensible Markup Language (XML) is a subset of 
Standard Generalized Markup Language (SGML). XML is 
described in XML 1.0, found at the following web site: 
http://www.w3.org/TR/REC-xml. Extensible Markup Lan- 
guage (XML) is a set of rules or guidelines for designing text 
formats for structured data using tags. Additional detail may 
be found at the following web site: http://www.w3.org^ 
XML/1999/XML-in-lO-points, For interoperability, 
domain-specific tags called a vocabulary can be standard- 
ized using a Document Type Definition, so that applications 
in that domain understand the meaning of the tags. 

[0041] Extensible Style Language Transformer or XSLT is 
a language for transforming XML documents into other 
XML documents. The XSLT specification defines the syntax 
and semantics of the XSLT language. XSLT-defined ele- 
ments are distinguished by belonging to a specific XML 
namespace, which is referred to as the XSLT namespace. A 
transformation expressed in XSLT describes rules for trans- 
forming a source tree into a result tree. Further detail about 
XSLT may be found at http://www.w3.org^TR/xslt. 

[0042] XML Path or XPath addresses parts of an XML 
document. XPath gets its name from its use of a path 
notation as in URLs for navigating through the hierarchical 
structure of an XML document. Further detail about XML 
path may be found at http://www.w3.org/TR/xpath. 

[0043] A Document Object Model (DOM) is a standard set 
of function calls for manipulating XML files from a pro- 
gramming language. Additional detail may be found at the 
following web site: http ://www.w3 .org/TR/REC-DOM- 
Level-1/. 

C. Overview of the XML System 
[0044] In one embodiment of the invention, the XML 
System comprises the XML Extender from International 



Business Machines, Corporation, of Armonk, N.Y. The 
XML System offers the capability of XML storage and data 
interchange. By storage, the XML System provides mecha- 
nisms for storing and retrieving XML documents in a 
relational database (e.g., DB2® from International Business 
Machines, Corporation) and searching the content of XML 
with high performance. By data interchange, the XML 
System provides a mapping between new and existing 
relational tables and XML formatted documents. Thus, the 
XML System allows customers to do e-business anywhere, 
enabling XML with Business to Business (B2B) and Busi- 
ness to Consumer (B2C) applications. For B2B applications, 
application data flows between database servers, via any 
network (e.g., the internet or an intranet), either directly 
without client interaction or indirectly via some client sys- 
tems. For B2C applications, application data flows between 
a consumer at, for example, a workstation, and a server 
connected via a network (e.g., between database servers and 
web clients via the internet). Thus, the XML System sup- 
ports Business to Business (B2B) and Business to Client 
(B2C) applications. In both cases, the following require- 
ments will apply: 

[0045] Performance 

[0046] Scalability 

[0047] Integration with existing business logic 

[0048] Smart query support 

[0049] Legacy file support 

[0050] Developer efficiency 

[0051] FIG. 2 is a diagram illustrating a computer hard- 
ware environment that could be used in accordance with the 
present invention. In one embodiment, the DB2 XML 
Extender 200, a product from International Business 
Machines, Corporation, is at the center of the architecture. 
An application program 202 and a document access defini- 
tion (DAD) 204 are received by the DB2 XML Extender 
200. The DB2 XML Extender 200 takes an XML document 
206 as the input, stores the XML document 206 in DB2 210 
(i.e., a relational database) either internally inside DB2 210 
or externally on the file system as one or more XML files 
208. Then, the stored XML document 206 can also be 
retrieved from DB2 210 or the file system through the DB2 
Extender 200. The processing performed by the DB2 XML 
Extender 200 will be described in more detail below. 

[0052] In another embodiment, an application program 
202 and a document access definition (DAD) 204 are 
received by the DB2 XML Extender 200. The DB2 XML 
Extender 200 takes an XML document 206 as input, decom- 
poses the XML document 206 into fragmented data and 
stores the fragmented data in DB2 210 (i.e., a relational 
database). Then, the fragmented data stored in DB2 210 can 
be regenerated from DB2 210 through the DB2 Extender 
200. The processing performed by the DB2 XML Extender 
200 will be described in more detail below. 

[0053] Those skilled in the art will recognize that the 
environment illustrated in FIG- 2 is not intended to limit the 
present invention. Indeed, those skilled in the art will 
recognize that other alternative hardware environments may 
be used without departing from the scope of the present 
invention. 
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[0054] C. 1 Applications 

[0055] Different types of applications can benefit from the 
use of the XML System. Some illustrations follow: 

[0056] Business to Business (B2B) applications for 
E-Commerce: 

[0057] B2B applications mainly use XML as their inter- 
change format, such as Electronic Data Interchange (EDI). 
The XML System enables maintaining native XML format- 
ted documents, as well as mapping data into/from relational 
tables. With native XML formatted documents, XML 
enables storing entire XML documents into a database and 
searching on known elements or attributes. With mapping, 
XML System enables an application builder who knows the 
relational data model of particular business tables to custom 
map XML content to or from existing tables. 

[0058] Web Information Retrieval Applications: 

[0059] These are B2C applications which are often used in 
interactive Web sites, such as sites for insurance and real- 
estate industries. The XML documents are usually not very 
large in size, but have structured information. 

[0060] The XML System enables storing entire XML 
documents into a database and using SQL to do a fast search 
on desired XML elements or attributes with rich data types. 
Range search for rich data types is often important. Addi- 
tionally, the XML System enables retrieving data from 
existing business tables and from XML documents and 
putting them on a web site for viewing. 

[0061] For example, an insurance company may set up a 
call center system in which agents retrieve phone calls from 
their customers. The information is collected, and the case is 
archived. The XML System is used to store entire XML 
documents in a database. Then, an insurance agent can easily 
display an insurance case on a screen. The XML System also 
provides a fast and powerful search of these insurance cases, 
so the insurance agent can quickly retrieve information 
while still on the phone with a customer. Additionally, 
alternative ways of searching for information, i.e. numbers, 
text wildcards, key words, etc., are provided by XML 
System. 

[0062] Content Management: 

[0063] This type of application provides advanced content 
management functions to a user. A user could use XML 
System as physical storage, and have fast search with 
indexing. The XML documents are usually large in size. In 
some cases, it is desirable to partition the XML documents 
into multiple pieces and perform update in place. 

[0064] As an extender to DB2®, XML System enhances 
DB2® functionality for XML enablement. That is, XML 
System enables use of SQL as the main access technique, 
along with database features of: stored procedures, user 
defined types (UDT) and user defined functions (UDF). 

[0065] The XML System meets the following require- 
ments: 

[0066] Physical Storage: for entire document, or 
shredded structured data, with data types. 

[0067] Support of flat files: allows data to be stored in 
flat files, and imported/exported to/from database. 



[0068] Access via SQL: supports field search, full 
text search, structural search. 

[0069] Indexing facility: builds on different data 
types for better query performance. 

[0070] Updates XML element/attribute. 

[0071] Mapping to/from relational: composes/de- 
composes XML documents from/to data store in 
relational tables. 

[0072] NLV support: support of double byte charac- 
ters. 

[0073] C.2 XColumns and XCollections 

[0074] XML System provides good data and metadata 
management solutions to handle traditional and non-tradi- 
tional data. With the content of structured XML documents 
in a database, a user can combine structured XML informa- 
tion with traditional relational data. Based on the applica- 
tion, a user can choose whether to store entire XML docu- 
ments in a database as a non-traditional distinct data type or 
map the XML content as traditional data in relational tables. 
For non-traditional XML data types, the XML System adds 
the power to search rich data types of XML element or 
attribute values. For traditional SQL data, that is either 
decomposed from incoming XML documents or in existing 
relational tables to be used to create outgoing XML docu- 
ments, the XML System provides a custom mapping mecha- 
nism to allow the transformation between XML documents 
and relational data. 

[0075] The XML System offers the flexibility to store 
entire XML documents as column data or transform between 
XML documents and data in existing tables. The transfor- 
mation includes decomposing an XML document into one or 
multiple pieces and storing the pieces in the form of rela- 
tional data, as well as, composing XML documents from the 
data in existing relational tables. A user can decide how 
structured XML documents are to be stored or created 
through a Document Access Definition (DAD). 

[0076] The DAD itself is an XML formatted document. 
The DAD associates XML documents to a database through 
two major access and storage techniques by defining ele- 
ments Xcolumn and Xcollection. Xcolumn defines how to 
store and retrieve entire XML documents as column data of 
the XML user defined type (UDT). An XML column is a 
column of XML System's user defined type (UDT). Appli- 
cations can include an XML column in any user table. 
Operations on the XML column can be processed after the 
column is enabled with the XML System. A user can access 
XML column data mainly through the SQL statements and 
XML System's user defined function (UDF). With the 
different access and storage techniques, the XML System 
provides the flexibility of XML data storage and retrieval. 

[0077] In particular, an XML column is used to store entire 
XML documents in the native XML format. This approach 
treats XML format as an non-traditional data type and offers 
user defined types (UDTs) and user defined functions 
(UDFs) for a fast, versatile, and intelligent technique for 
searching through XML documents. The XML System gives 
applications the freedom to specify a list of XML elements/ 
attributes as general SQL data types for fast search. The 
XML System will extract these values from the XML 
documents and store them in side tables so that a user can 
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create indices on them. The application can query these side 
tables or join them with the application (i.e., "main") table 
to do a fast search. For example, a user can input a query 
such as: "give me all the documents whose prices are greater 
than $2500.00", providing 2500.00 is the value of an XML 
element or attribute inside the XML documents. 

[0078] The XML System provides several user defined 
types (UDTs) for XML columns. These data types are used 
to identify the storage types of XML documents in the 
application table. The XML System supports legacy flat 
files, and a user is not restricted to storing XML documents 
inside a database. A user can also store XML documents as 
files on the local or remote file system, specified by a URL 
or a local file name. 

[0079] The XML System provides powerful user-defined 
function (UDF)s to store and retrieve XML documents in 
XML columns, as well as to extract XML element/attribute 
values. The UDFs are applied to XML user defined types 
(UDTs), thus, these are mainly used for XML columns. 

[0080] An Xcollecuon defines how to decompose XML 
documents into a collection of relational tables or to com- 
pose XML documents from a collection of relational tables. 
An XML collection is a virtual name of a set of relational 
tables. Applications can enable an XML collection of any 
user tables. These user tables can be existing tables of legacy 
business data or the ones newly created by the XML System. 
A user can access XML collection data mainly through the 
stored procedures provided by the XML System. 

[0081] An XML collection is used to transform data 
between database tables and XML documents. An XML 
collection achieves the goal of data interchange via XML. 
For applications that want to compose or decompose XML 
documents from/into a set of relational tables, the XML 
System offers a technique to enable an XML collection 
through a Document Access Definition (DAD). In the Docu- 
ment Access Definition, applications can make a custom 
mapping between database column data in new or existing 
tables to XML elements or attributes. The access to an XML 
collection is by calling XML System's stored procedures or 
directly querying to the tables of the collection. 

[0082] The XML System also allows overrides of query 
conditions explicity or implicitly defined in the DAD, by 
parsing the SQL or XML XPath based override parameter to 
the composition stored procedures. In this way, it supports 
dynamic query for generating XML documents. 

[0083] With the XML System, an application can: 

[0084] Store entire XML documents as column data 
in an application table, either internally or externally 
as a local file or URL, while extracting desired XML 
element or attribute values into side tables for search. 

[0085] Compose or decompose contents of XML 
documents from/into an XML collection which con- 
sists of one or more relational tables. 

[0086] Perform fast search on XML elements or 
attributes of SQL general data types by converting 
character string in XML documents to SQL data 
types for indexing. 

[0087] Update the content of an XML element, or the 
value of an XML attribute. 



[0088] Extract XML elements or attributes dynami- 
cally in SQL query. 

[0089] Validate XML documents during insertion 
and update. 

[0090] The XML System also serves as an XML document 
type definition (DTD) repository. When a database is XML 
enabled, a DTD Reference Table (DTD_REF) is created. 
Each row of this table represents a DID, with additional 
metadata information. This table is accessible by users, and 
allows them to insert their own DTDs. The DTDs in the 
DTD_REF table are used to validate XML documents and to 
help applications to define a document access definition 
(DAD). 

[0091] C.3 Terminology 

[0092] This section clarifies some terminology used in this 
specification. 

[0093] The XML System uses a subset of Extensive 
Stylesheet Language Transformation (XSLT) and XML Path 
Language (XPath), Version 1.0, the W3C working draft of 
Jun. 17, 1999, to identify XML elements or attributes. The 
content of the XPath is originally in the XSLT and now is 
referred by the XSLT, as a part of the stylesheet transfor- 
mation language. Location path is used to define XML 
elements and attributes. The XSLT/XPauYs abbreviated 
syntax of the absolute location path is used. 

[0094] The following is not a formal data model, but a set 
of abbreviated syntax. The notation of the absolute location 
path with abbreviated syntax supported by the XML System 
is listed below. 

[0095] This section clarifies some terminology used in this 
specification. 

[0096] The XML System uses a subset of Extensive 
Stylesheet Language Transformation (XSLT) and XML Path 
Language (XPath), Version 1.0, the W3C working draft of 
Jun. 17, 1999, to identify XML elements or attributes. Hie 
content of the XPath is originally in the XSLT and now it is 
referred to by XSLT as a part of the stylesheet transforma- 
tion language. Previously, the term "path expression" was 
used. Now, a subset of the term location path is used in 
XSLT and XPath to define XML elements and attributes. 
The XSLT/XPath's abbreviated syntax of the absolute loca- 
tion path is used. 

[0097] The following is not a formal data model, but a set 
of abbreviated syntax. An absolute location path with abbre- 
viated syntax is listed below. This is supported by the XML 
System. Again, these are not formal definitions. 

[0098] a.V: 

[0099] Represents the XML root element. 
[0100] b. "/tagl": 

[0101] Represents the element tagl under root. 

[0102] c. w /tagl/tag2/ . . . /tagn": 

[0103] Represents an element with the name tagn 
as the child with the descending chain from root, 
tagl, tag2, . . . , tagn-1 
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[0104] d. "//tagn" 

[0105] Represents any element with the name tagn, 
where U IT denotes zero or more arbitrary tags. 

[0106] e. 7tagl//tagp" 

[0107] Represents any element with the name tagn 
which is a child of element with the name tagl 
under root, where "//" denotes zero or more arbi- 
trary tags. 

[0108] f. "AaglAag2/®attrl w 

[0109] Represents the attribute attrl of element 
with the name tag2 as a child of element tagl 
under root. 

[0110] g. M /Ugl/tagZt@attrl- u 5T 

[0111] Represents the element with the name tag2 
whose attribute attrl has the value '5' and it is a 
child of element with the name tagl under root. 

[0112] h. 7Ugl/tag24;@attrl- u 5'']/ . . . Aagn" 

[0113] Represents the element with the name tagn 
which is a child of the descending chain from root, 
tagl, tag2, . . . where the attribute attrl of tag2 has 
the value '5*. 

[0114] i. 7tagl/tag2/tag3- M Los Angeles"/ . . . Aagn" 

[0115] Represents the element with the name tagn 
which is a child of the descending chain from root, 
tagl, tag2, . . . where tag3 has the value "Los 
Angeles'*. 

[0116] j. "Aagl/tag2/*[@atttl-"5"r 

[0117] Represents all elements as children of ele- 
ment w /tagl/tag2" with attrl of value "5". 

[0118] There are restrictions on the location path when 
used by the XML System, and these are listed in the table 
below. 



TABLE 



TABLE 



Simple Location Path of an Element and an Attribute 
Subject location path Description 



XML 


/tag_l/tag__2/...,/tag_ji-:l/tag_n 


an element con- 


Element 




tent identified by 






the tag n and its 






parents 


XML 


/tag__l/tag_2/...7ta&_n-l/tag_n7@attrl 


an attribute with 


Attribute 




name "attrl" of 






the element 






identified by 






tag_n and it 






parents 



[0121] The location path identifies the structure part that 
indicates the document context to be found. An empty path 
signals the structure to search or extract against is the whole 
document (same effect as if the location path is the root 
element). 

[0122] The XML System provides users the ability to 
create SQL queries on XML documents. Based on the nature 
of XML documents and the functionality of the XML 
System, the following terminology is used: 



Document Access 
Definitioa(DAD): 

Partition: 



Location path: 



Side table: 



Vklid Document: 



Well-formed document: 



Restriction of Location Path Supported 




Use of the Location Path 


Location Path Supported 


XML Attribute: 


Extracting UDFs 


H 




Text Extender's search UDF 






DAD column definition 


c, f (simple location path) 


XML Column: 



[0119] Note that there is a restriction in the DAD column 
definition because there is a one-to-one mapping between an 
element or attribute to a column. 

[0120] The term simple location path refers to the c and f 
notations in the table for Restriction of Location Path 
Supported. The simple location path is a sequence of ele- 
ment type names connected by the "T notation. Each 
element type may be qualified by its attribute values. 



XML Collection: 



XML Element: 



XML Tag: 



The definition used to enable an XML 

System column or an XML collection, 

which is XML formatted. 

The term partition used means the full 

partition. In other words, the union of all 

partitioned parts forms the original 

document 

A subset of the abbreviated syntax of the 
location path defined by XPath. A 
sequence of XML tags to identify an 
XML element or attribute. It is used in the 
extracting UDFs to identify the subject to 
be extracted. The terms of path expression 
and location path may be used 
interchangeably. 

Additional tables created by the XML 
System to store searchable 
elements/attributes for an enabled XML 
Column. 

An XML document that has an associated 
DTD. To be valid, the XML document 
cannot violate the syntactic rules specified 
in its DTD. 

An XML document that does not contain 
a DTD. A document with a DTD (valid) 
must also be well-formed. 
Any attribute specified by the ATTUST 
under the XML clement in the DTD. The 
XML System uses the location path to 
identify an attribute. 

A column in the application table of the 
XML System UDT type. The term of 
XML enabled column and XML column 
will be used interchangeably. 
A collection of relational tables which 
present the data to compose XML 
documents or to be decomposed from 
XML documents. 

Any XML tag or ELEMENT as specified 
in the XML DTD. The XML System uses 
the location path to identify an element. 
Any valid XML markup language tag, 
mainly the XML element. The term tag 
and element are used interchangeably. 
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-continued 



XML Tabic: An XML Table is an application table 

which includes XML System. column(s). 
The terms XML enabled table and XML 
table arc used interchangeably. 

XML Object: The terms XML Object and XML 

document are used interchangeably. 

XML UDT: User defined type provided by the XML 

System. 

XML UDF: User defined function provided by the 

XML System. 



[0123] C.4 Example of an XML DTD 



[0125] In the above Lineltem.dtd, the term Lineltem.dtd is 
the title of the Document Type Definition. The term <?xml 
encoding-"US-ASCir?> indicates that encoding is in US- 
ASCII. The terms beginning with ELEMENT refer to ele- 
ments of an XML document, and the terms beginning with 
ATTLIST refer to attributes of an XML document. The DTD 
is used to verify a Document Access Definition. 

[0126] C.5 Example of an XML Document 

[0127] The following is an example of an XML formatted 
document: 



ordeucml 

<?xml veision-"l.<r?> 

<!DOCTYPE Litem. DTD SYSTEM "E:\dn\test\d tdUinelteiiLdtd^ 
<Order Key-**1"> 

<Customer>Gencral Motor </Customer> 

<Part Key-" 15 6" > 

<Quantity>l 7</Quantity> 

<ExtcndedPrice>17954^5<^ExtendedPrice> 

<Tax>0.02</Tax> 

<Shipment> 

<SmpDate>1998-05-15</SnipDate> 
<ShipMode>TRUCK</ShipMode> 

<Omiment>This is the first shipment to service of GM</Comment> 

</Shipment> 

<Shipment> 

<ShipDate>1999-01-16</ShipDate> 
<ShipMode>FEDEX</ShipMode> 

<Comment>This the second shipment to service of GM.</Comment> 

</Shipment> 

</Part> 

<Part Kcy-"68"> 

<Quantitv>36</Quaniity> 

<ExteodedI^rice>348S0.16</ExtcndedPricc> 

<Tai>O.0S</Iax> 

<Shipmcnt> 

<ShipDate>199*^12^ShipDate> 
<5hipMode>BOAT</SnipMode> 

<Comment>This shipment is requested by a call, from GM marketing. <yComment> 

</Shipment> 

<shipment> 

<ShiDate>1998-08-19^ShipDatc> 
<ShipMode>AIR</ShipMode> 

<Comment>This shipment is ordered by an email. <^Comment> 

<Shipment> 

</Part> 

</Order> 



[0124] The following DTD is provided as an example: 



Lin el tern. did 

<?xml encoding-"US-ASar?> 

<! ELEMENT Order (customer,Part+)> 

<!ATTUST Order Key CD ATA #REQUIRED> 

<! ELEMENT Customer *PCDATA> 

<! ELEMENT rVt(Oiiajitity3xtendedPrice,Tax^hipniettt')> 

<!ATTUST Part Key CDATA> 

<! ELEMENT Quantity (#PCDATA)> 

<! ELEMENT ExtendedPrice (#PCDATA)> 

<! ELEMENT Tax (#PCDATA)> 

<I ELEMENT Shipment (ShipDate^hipMode,Comment)> 
<! ELEMENT ShipDate (#PCDATA)> 
<! ELEMENT ShipMode (#PCDATA)> 
<! ELEMENT Comment (#PCDATA)> 



[0128] In the above XML document, the term order.xml is 
the title of the XML document. The term <?xml version- 
"l.(T?> indicates that this document is based on XML 
Version 1.0. The term <!DOCTYPE Litem_DTD SYSTEM 
w E:Wxx\test\dtd\LineItem.dtd"> is text for the XML docu- 
ment type definition and references the example Document 
Type Definition, entitled Lineltem.dtd, in C.4, which is used 
for validation. 

[0129] The remaining terms define the data in the XML 
document. For example, the term <Quantity>17</Quantity> 
indicates that quantity has a value of 17. Also, note that 
<Quantity> without a slash at the beginning defines a start 
tag and </Quantity> with a slash at the beginning defines an 
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end tag. Similarly, other terms in the XML document use 
such tags. 

[0130] C.6 The Document Access Definition (DAD) 

[0131] A user decides how XML document data is to be 
accessed in a database. That is, the 150 

[0132] user defines a DAD. With the help of a Graphical 
User Interface (GUI) tool, the user can create a DAD to 
define a mapping and indexing scheme. 

[0133] A Document Access Definition(DAD) is defined by 
the following Document Type Definition (DTD): 



• daddtd 

<?xmi encoding-*US-ASCir?> 

<I ELEMENT DAD (dtdid?, validation, (Xcohimn (Xcollection))> 
<I ELEMENT dtdid (#PCDATA)> 
<! ELEMENT validation (#PCDATA)> 
<! ELEMENT Xcohimn (tablc*)> 
<! ELEMENT table (column •> 
<! ATTLISr table name CD ATA #REQU1RED 
key CDATA #tMPLf£D 
orderBy CDATA #XMPLIKD> 
<! ELEMENT column EMPTY> 
<!ATTLIST column 

name CDATA #REQUIRED 
type CDATA #1 MP LIED 
path CDATA #tMPUED 
multLoccuixence CDATA «MPIJED> 
<! ELEMENT Xcoliection (SQLstmt*, prolog, doctype, root.node)> 
< I ELEMENT SQLjJtmt (#PCDATA)> 
<! ELEMENT prolog (#PCDATA)> 
<!ELEMENT doctypc (#PCDATA |RDB_nodc)"> 
<!ELEMENT root node (element.node)> 
<! ELEMENT clcment_node (RDB_node?, 
attribute .node*, 
tcxt.node?, 
clcment.node 
namespace node", 
process. ins tructioii. node * , 
commcnt.node *)> 
<!ATTLIST clcment_node 

name CDATA #REQU1RED 

ID CDATA #IMPLIED 

multi.occurrence CDATA "NCT 
BASE URI CDATA #TMFUED> 
<! ELEMENT attribute .node (column |RDB.node)> 
<!ATTliST attribute.node 

name CDATA #REQUIRED> 
<! ELEMENT text.node (column [RDBjiode)> 
<! ELEMENT RDBjiode (table+, column?, condition?)> 
<! ELEMENT condition (#PCDATA> 
<! ELEMENT commcnt.node (#PCDATA)> 
<! ELEMENT namcspace.node EMPTY> 
<1ATTLIST namcspace.node 

name CDATA JtfMPLIED 
value CDATA #TMPLIED> 
<! ELEMENT proocss.uistructioa.node (#PCDATA)> 



[0134] The XML System Administration GUI will provide 
an interface to create DAD files. The DAD itself is a tree 
structured XML document. The important elements and 
attributes of the DAD are: 

[0135] DTDID: 

[0136] The identifier of the DTD stored in the dtd_ref 
table. It represents the DTD which validates the XML 
documents or guides the mapping between XML collection 
tables and XML documents. DTDID must be specified for 
XML collections. For XML columns, it is optional and is 



only needed if you want to create side tables for indexing on 
elements/attributes or validate new XML documents. The 
DTDID must be the same as the SYSTEM ID specified in 
the "doc type" of the XML documents. 

[0137] Validation 

[0138] For validating XML documents with the DTD, and 
"No" for no validation. If "Yes", then the DTDID must also 
be specified. 

[0139] Xcohimn 

[0140] An Xcolumn defines the indexing scheme for an 
XML Column. It is composed by zero or more tables. 

[0141] table: 

[0142] The relational side table(s) created for index- 
ing elements or attributes of documents stored in an 
XML column. You can have one or more tables. A 
table is specified by: 

[0143] name: 

[0144] name of the side table. 
[0145] column: 

[0146] The column of the side table, which contains 
the value of a location path of the specified type. 

[0147] name: name of the column. It is the alias 
name of the location path which identifies an 
element or attribute, 

[0148] type: the data type of the column. It can be 
any SQL data type. 

[0149] path: the location path of an XML element 
or attribute. Only a simple location path defined in 
Section C.3 is allowed here. 

[0150] multi_occurrence: "YES" or "NO" to 
specify whether this element or attribute will have 
in one XML document. For Xcolumn, if multi_o- 
ccurrence is specified as "YES", the XML System 
will add another column "DXX_SEQNO" with 
type Integer in the side table which this column 
belong to. This DXX_SEQNO keeps track of the 
order of elements occurred for the path expression 
in each inserted XML documents. With DXX_SE- 
QNO, the user can retrieve a list of the elements 
with the same order as the original XML docu- 
ment using "ORDER BY DXX_SEQNCT in SQL. 

[0151] Xcoliection 

[0152] The Xcoliection defines mapping between XML 
documents and an XML collection of relational tables. It is 
composed by the following elements: 

[0153] SQL stmt 

[0154] The SQL statement to specify the operation 
needed to achieve the mapping. It must be a valid 
SQL statement. It is only needed for composition, 
and only one SQLstmt of query is allowed. 

[0155] objids 

[0156] A list of identifiers, each of which conceptu- 
ally identifies a row object in the database table, so 
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that the row to be selected is ordered by this unique 
value. It is only needed when SQL_stmt is supplied. 
The ID can be a column name, or a value generated 
from the generate_jinique( ) function or a UDF. It is 
recommended but not necessary to be the primary 
key of the table. 

[0157] prolog 

[0158] Text for the XML prolog. The same prolog is 
supplied to all documents in the entire collection. It 
is a fixed text. This because only XML 1.0 is 
supported, and UDB® only supports UTF-8. 

[0159] doctype 

[0160] Text for the XML document type definition. 
Hie doctype can be specified in one of the following 
two ways: 

[0161] The same doctype is supplied to all docu- 
ments in the entire collection. In this case, it is a 
fixed text. 

[0162] When decomposing, the doctype can be 
stored as a column data of a table. In this case, the 
RDBjiode should be specified. 

[0163] root_node 

[0164] The virtual root node which must has one and 
only one element_node. The element_node under the 
root_node is actually the root_node of the XML 
document. 

[0165] RDB_node: 

[0166] The node defines the mapping between an 
XML element or attribute and relational data. It 
consists of: 

[0167] table: 

[0168] name: the name of a relational table in 
which the data of an XML element or attribute 
reside. 

[0169] key: the primary single key of the table. It 
must be specified for decomposition. 

[0170] For the root eleroent_node, all tables stor- 
ing its attribute or all child element data should be 
specified. 

[0171] orderBy: names of columns that determine 
the sequence order of multiple-occurring element 
text or attribute value when generating XML 
documents. 

[0172] column: 

[0173] It must be specified for text_node or 
attribute_oode, but not for the element_node. 

[0174] name: name of the column which contains 
the value of an XML element or attribute. It must 
be specified for both composition and decompo- 
sition. 

[0175] type: the data type of the column. It is 
needed only for decomposition. 



[0176] path: the location path of the element or 
attribute. It is not needed for Xcollection, only for 
Xcolumn. 

[0177] multi_occurrence: multiple occurrence of 
the element or attribute. 

[0178] condition: the predicate to specify query 
condition. It serves two purposes: 

[0179] In RDB oode of a text_node or 
attribute_node: if specified, it qualifies the con- 
dition to select the column data to be used to 
compose or decompose XML element text or 
attribute value. It is optional. 

[0180] In RDB_node of the root element_node: 
if more than one tables are supplied, it must be 
specified as the condition to join tables. 

[0181] eIement_node 

[0182] Representing an XML element. It must be 
defined in the specified DTD. For the RDBjiode 
mapping, the root elementnode must have a RDB- 
_node to specify all tables containing XML data for 
itself and all its children nodes. It can have zero or 
more attribute_nodes and child element_nodes, as 
well as zero or one text_node. In the next release, an 
element_node can also contain namespace_nodes, 
process_instruction_nodes and comment_node. 

[0183] An element_node is defined by: 

[0184] Attributes: 

[0185] name: The name of the XML element. It is 
the tag name. 

[0186] ID: The unique ID. This is adapted from 
XPTH. 

[0187] BASE_URI: The base URI for the name 
space. This is also adapted from XPTH. 

[0188] Optional RDB_node: 

[0189] The RDB node is only needed for the root 
element_node when using RDB_node mapping. In 
this case, all tables involved to generate or decom- 
pose XML documents must be specified. The column 
is not needed. The condition must be specified to 
show the join relationship among tables. 

[0190] Optional child nodes: 

[0191] An element_node can also have the following 
child nodes: 

[0192] e!ement_node(s): representing child ele- 
ments) of this element, 

[0193] attribute_node(s): representing attribute^) 
of this element; 

[0194] text_node: representtne CDAIA text of this 
element, 

[0195] comment_node: representing the comment 
for this element, 

[0196] namespace_node: representing the 
namespace of this element, 
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[0197] process_instruction_node: representing the 
process instruction, 

[0198] attributenode: 

[0199] Representing an XML attribute. It is the node 
defining the mapping between an XML attribute and 
the column data in a relational table. It must has a 
name, and a column or a RDB_node. 

[0200] Attribute: 

[0201] name: the name of the attribute. It must be 
defined in the DTD. 

[0202] Column or RDBjiode: 

[0203] Column: needed for the SQL mapping. In 
this case, the column must be in the SQL_stmt's 
SELECT clause. 

[0204] RDB_node: needed for the RDB_node 
mapping. The node defines the mapping between 
this attribute and the column data in the relational 
table. The table and column must be specified. The 
condition is optional. 

[0205] text_node: 

[0206] Representing the text content of an XML 
element. It is the node defining the mapping between 
an XML element content and the column data in a 
relational table. It must be defined by a column or a 
RDBnode. 

[0207] Column: needed for the SQL mapping. In 
this case, the column must be in the SQL_stmt , s 
SELECT clause. 

[0208] RDB_node: needed for the RDB_node 
mapping. The node defines the mapping between 
this text content and the column data in the 
relational table. The table and column must be 
specified. The condition is optional. 

D. Creating Metadata for Fast Search of XML 
Documents Stored as Column Data 

[0209] One embodiment of the invention provides an 
XML System which solves the problem of fast searching and 
indexing of XML element/attribute values of XML docu- 
ments when they are stored inside a database as column data. 

[02 10] An XML document is a structured document XML 
lets a user structure a document by elements or attributes 
(e.g., title or author). Once a document is structured in this 
manner, a structured search man be performed based on 
element or attribute values (or content). 

[0211] The embodiment of the invention converts the 
characters of element/attribute values to any general SQL 
data type. Additionally, the embodiment of the invention 
provides a technique for performing a range search on the 
data. That means the element or attribute values are con- 
verted to SQL types (e.g., number of pages may be an 
integer). With this embodiment of the invention, indices can 
be created on XML element/attribute values, thus the search 
operation is scalable. 

[0212] The embodiment of the invention permits applica- 
tion programmers to define a Data Access Definition (DAD) 
which identifies the XML elements or attributes that need to 



be indexed and defines the mapping between XML elements 
or attributes to columns in one or more side tables. The DAD 
is an XML formatted document that is used to specify within 
an XML document which elements or attributes are to be 
searched. The DAD also provides a location path or XPath. 
For example, if elements of a book are structured as follows: 

[0213] | Book 

[0214] | — -Title 

[0215] | -Author 

[0216] The location path for the above structure would be: 
/BoolyTitle/Author. 

[0217] Additionally, the embodiment of the invention 
stores XML document data in an application table, while 
storing particular elements or attributes in side tables. The 
data stored in the side tables is referred to as "metadata" and 
is used to search for elements or attributes in the XML 
documents stored as column data in the application table. 
During the enabling of a column which contains XML 
documents, side tables are created (based on the DAD) to 
store duplicate data of these elements or attributes. Several 
triggers are created so that values of these elements or 
attributes are extracted when operations are performed on 
XML documents in columns of an application table. The 
operations include, for example, insert operations on the 
application table, which trigger insert operations to also 
store the inserted XML data into the side tables. Triggers 
also manage the synchronization of XML data between the 
side table data during the deleting and updating operations 
on the column containing the XML documents in the appli- 
cation table. 

[0218] D.l Indexing for Searching XML Columns 

[0219] The indexing mechanism is applied on XML col- 
umns. In particular, the indexing mechanism discussed here 
is a technique to create an index on XML element or 
attribute values when entire XML documents are stored in 
XML columns. 

[0220] With a large collection of XML documents, search 
performance is a critical user requirement. Index support 
provides fast query performance at the cost of slower update 
performance due to index updates. The XML System pro- 
vides an indexing mechanism that allows search predicates 
at query-time to be evaluated through indices, without 
reading document sources. 

[0221] The XML column indexing mechanism allows 
frequently queried data of general data types, such as 
integer, decimal, or date, to be indexed using the native 
database index supports from the database engine. This is 
achieved by extracting the values of XML elements or 
attributes from XML documents, storing them in the side 
tables, then allowing application programmers to create 
indices on these side tables. In a DAD, a user can define 
Xcolumns by specifying each column of a side table with a 
location path that identifies an XML element or attribute and 
a desired SQL data type. The XML System then will 
populate these side tables when data is inserted into the 
application table. An application can create an index on 
these columns for fast search, using the database B-tree 
indexing technology. The technique and options for creating 
an index may vary across platforms. Application program- 
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mers have the freedom to create a desired index as they 
usually do with a database od their platform. 

[0222] For elements/attributes in an XML document 
which occur multiple times, a separate table is created for 
each XML element/attribute with multiple occurrences, due 
to the complex structure of XML documents. 

[0223] For example, a user may want to create an index on 
VOrder/Part/ExtendedPrice', and specify VOrder/Part/Ex- 
tcndedPrice' to be of data type REAL. In this case, XML 
System will store the value of VOrder/Part/ExtendedPrice' 
in the specified column "price* in a side table. Multiple 
indices on an XML column are allowed. In the example, a 
user can create two columns in two side tables, one for 
'ExtendedPrice* and one for "ShipDate". 

[0224] When side tables are created, they are tied together 
with the main (or application) table through the notion of 
rootjd. A user can decide whether the primary key of the 
application table is to be the "root_id". If the primary key 
does not exist in the application table, or for some reason a 
user doesn't want to use the primary key, then XML System 
will alter application table to add a column DXXROOT ID 
for storing a unique identifier created at insertion time (i.e., 
when data is inserted into the application or main table). All 
side tables will have a "DXXROOTJD" column and have 
the unique identifiers stored. If the primary key is used as the 
root_id, then all side tables will have a column with the same 
name and type as the primary key column in the application 
table, and the values of the primary keys are stored. 

[0225] D.2 Sample DAD for an XML Column 

[0226] Assuming the XML documents need to be stored 
are like the one shown in C_5. 

[0227] Example of an XML Document, the following 
example DAD will store the XML documents in an XML 
column and create several side tables for indexing. 



Liiem.DADl.d21d 

<?xml version-" 3. (T7> 
<!DOCTYFE Older SYSTEM -EAdtd\dxtdad.clur> 
<DAD> 

<dtdid>E:\dld\l incite m.dtd</dtdid> 
<validation> YES </validatio o> 
<Xcolumn> 

ctable name— **order.tab"> 

ccokunn name»**order^kcy" 
type-** integer" 
patWVOrder/@Key'* 
multi_occurrence-**NO'7> 
ccolumn name— "customer" 
type-"varchar(50)" 
path— "^OrtfcryOistomcr'* 
mul tLoccurre nee-** NO" / > 

<Aablc> 

<table name-**part.tab**> 

ccolumn name-** part )te y" 
type-"integer" 
palh-**/Order/Paxt/@Kjcy" 
mulrioccurrence"**YES"/> 

Diablo 

ctable name -"price. tab" > 
ccolumn name—** price** 
type-** double** 

path="/Order/rVrt/ExlendedPrice'' 
mul*i.ocairrence»**YES*7> 



-continued 

</table> 

ctable name— **ship_tab"> 

ccolumn nam e— "date " 
type-" date" 

paiJ^/OrderTP art/Ship ment/Ship Dale" 
multi.occurrence- M YES*7> 

<jtable> 
c/Xcolumn> 
</DAD> 



[0228] In the above DAD, Litem_DADl.dad is the name 
of the DAD. The phrase <?xml version-" l.(T?> identifies 
the version, and the phrase <!DOCTYPE Order SYSTEM 
"E:\dtd\dxxdad.dtd"> is text for the XML document type 
definition. The first DAD and the second DAD tags indicate 
that the information between these tags comprise the data 
access definition. The phrase <dtdid>E:\dtd\HneItem.dtd</ 
dtdid> identifies the document type definition (DTD) to be 
used. The phrase <validation>YES<A^alidation> indicates 
that this DAD is to be validated against the DTD. The four 
table name terms identify the four side tables to be created. 

[0229] In this example, the four side tables created for 
indexing are as follows: 

[0230] order_tab: with columns of orderjkey and 
customer, representing attribute "/OrderAgJKey" and 
element "/Order/Customer" . 

[0231] partjab: with column of partjcey, represent- 
ing attribute "/Order/PartAaKey". 

[0232] price_tab: with column of price, representing 
element "/Order/Part/Price" 

[0233] ship tab: with column of date, representing 
element "/Order/Part/Shipment/ShipDate , \ 

[0234] For this example, it is assumed that the columns in 
the tables are the elements and attributes which need to be 
searched frequently. 

[0235] FIG. 3 illustrates an application or main table and 
its four side tables. The Application table 300 has a root_id 
in common with each side table 302, 304, 306, and 308. The 
side tables 302, 304, 306, and 308 correspond to the side 
tables defined in the DAD above. 

[0236] D3 XML Column/User Defined Types 

[0237] An XML column is designed to store XML docu- 
ments in their native format in the database as column data. 
After a database is enabled, the following user defined types 
(UDTs) are created: 

[0238] XMLCLOB: XML document content stored 
as a CLOB inside the database, 

[0239] XMLVarchan XML document content stored 
as a VARCHAR inside the database, 

[0240] XMLDBCLOB: XML document content 
stored as double byte CLOB inside the database, 

[0241] XMLFile: XML document stored in a file on 
a local file system, 

[0242] XMLURL: XML document stored as a uni- 
form resource locator (URL) via Data Link. 
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[0243] A user can use these UDTs as the data type of ao 
XML column. Ad XML column is created when a user 
creates or alters an application table. 

[0244] D.4 Creating an XML Table 

[0245] An XML table is a table that includes one or more 
columns created with the XML System UDT. To create such 
a table, an XML column is included in the column clause of 
the CREATE TABLE statement. 

[0246] Consider a line item order book keeping applica- 
tion. The XML formatted line item order is to be stored in 
a column called "order" of an application table called 
"salesjab". The sales_tab table also includes other columns 
of invo ice_n um be r and sales_person. Since the order is not 
very long, a user may decide to store it in the XMLVarchar 
type. The user may also decide to let the invoice_number be 
the primary key. Hie following create table statement can be 
used, where XMLVarchar is the XML System UDT: 



CREATE TABLE salea.tab 

( invoicc.numbcr char(6) NOT NULL PRIMARY KEY, 
sales .person varchai(20), 
order XMLVarchar); 



[0247] D.5 Defining Xcoiumn in DAD 

[0248] la order to use an XML column, a DAD needs to 
be prepared and enabled. In DAD preparation, a user first 
needs to define an "Xcoiumn". The following steps guide a 
user to define an "Xcoiumn", using the examples: XML 
document order.xml in C.5, DTD Lineltem.dtd in C.4, and 
DAD Litem_DADl.dad in D.2. 

[0249] Identify the XML elements and attributes 
which will be frequently searched in the application. 

[0250] In the above examples, the 70rder/!Key", 
"/Order/Customer", "/Oider/Part/@Kcy", "/Or- 
der/Part/ExtendedPrice", "/Order/Part/Shipment/ 
ShipDate" are mostly like to be searched and 
range search is needed for some of them. 

[0251] Decide how many side tables will be created 
for indexing. This is based on the understanding of 
the DTD and XML documents. 

[0252] In the above examples, since "/Order" has 
unique attribute "Key" and only one element "Cus- 
tomer**, they are put in the same side table 
"order_tab". One "Order" can have one or more 
"Part" items (see DTD definition in C.4), and each 
"Part** will have unique attribute "Key" and ele- 
ment "ExtendedPrice", and so these are separates 
into two tables: "part_lab" and "price_tab". Now, 
since one "Part" can have multiple "Shipment" 
items and each "Shipment" has one "ShipDate", 
the "ShipDate" is put into another table 
"shipjab". 

[0253] Define the column of each side table by speci- 
fying the column name, the matching XML element 
or attribute by location path, and the data type. 

[0254] In the examples, the ability to perform 
range search is desired on "ExtendedPrice" and 



"ShipDate", thus the data type is specified to be 
double and date respectively. Because there will 
be multiple occurrences of the /Order/Part/@key, 
/Order/Part/ExtendedPrice and /Order/Part/Ship- 
ment/ShipDate, specify the multi_occurre nee - 
"YES" for these elements or attributes. By doing 
so, the XML System will create an additional 
column DXX_SEQNO for side table price_tab 
and ship_tab so that a query can be performed 
using "order by DXXSEQNO" to get the element 
or attribute with the same order as that in the 
original XML documents. 

[0255] D.6 Enabling Parameters 

[0256] A column can be enabled through the XML System 
administration GUI or using a dxxadm command with the 
enable_column option. The syntax of the option is as fol- 
lows: 

[0257] dxxadm enabIe_coIumn db_name tab_name col- 
umn_name DAD_file -t tablespace -v default_view -r 
root_id] 

[0258] where: 

[0259] db_name. the database name 

[0260] tab_name: table name in which the XML 
column resides. 

[0261] cohimn_name: name of the XML column. 

[0262] DAD file: name of the file that contains 
Data Access Definition (DAD). 

[0263] tablespace: optional, but if specified, a pre- 
viously created tablespace which will contain side 
tables created by the XML System. 

[0264] default_view: optional, but if specified, it is 
the name of the default view created by XML 
System to join application table and all side tables. 

[0265] root^id: optional, but recommended, and if 
specified, it is the column name of the primary key 
in the application table, and XML System will use 
it as the unique "root_id" to tie all side tables with 
the application table. If not specified, XML Sys- 
tem will add the column of DXXROOTJD in the 
application table. Note: if the application table 
happened to have a column name as 
"DXXROOTJD", the primary key must be speci- 
fied as the "root_id", otherwise, an error will be 
returned. 

[0266] Here is an example for enabling the column order 
in the table sales_tab in database mydb with the DAD_file 
Litem_DADl.dad in C.4, default view sales_order_view 
and root_id invoice_number. 

[0267] /bome/ul>dxxadm enable_column mydb sales_tab 
order Litem_DADl.dad -v sales_order_view -r invoice_ 
number 
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[0268] DXXA007I XML Extender is enabling column 
order. Please waiL 

[0269] DXXA0081 XML Extender has successfully 
enabled the column order. 

[0270] /bome/ul> 

[0271] D.7 Results of the Column Enabling 

[0272] The enabling of an XML column mainly does the 
following things to a database: 

[0273] Read the DAD_file and do the following: 

[0274] if DTDID is specified, retrieve the DTD 
from the dld_ref table. 

[0275] process Xcolumn to create side tables 

[0276] create triggers for insert, update and delete 
on the XML column so that the side tables will be 
populated or updated. 

[0277] Create a default_view if specified. 

[0278] If root_id not specified, alter application table 
to add DXXROOTJD column. 

[0279] Update the XML_USAGE and dtd_ref table 
to reflect the enabling of this XML column. 

[0280] Based on the above examples, the user table sales 
tab has the following schema: 

[0281] Based on the above examples, the user table sal- 
estab has the following schema: 



User tabic aales__tab: 

Column Name invoice __number salcs_pcrson 
Data Type chax(6) varchar(20) 



order 

XMLVbrchar 



[0282] The enabling column operation will create the 
following side tables based on the DAD: 



Side_ table order tab: 



Column Name order key 

Data Type integer 
Location Path /Order/@Key 



customer invoice number 

varchar(50) char(6) 
/Order/Customer N/A 



Column Name 
Data Type 
Location Path 



Side table part_tab: 

parl_Jcey 
integer 

/Order/ParV@Kcy 



invoice_number 

char(6) 

N/A 



Side table price__ b: 

Column Name price invo ice number 

Data Type double chax(6) 

Location Path /Ordc r/Part/ExtendedPrice N/A 

Side table ship_tab: 

Column Name date invoice-number 

Data Type date char(6) 

Location Path /Ordc r/Part/Shipmenl/Ship Date N/A 



[0283] Note that because the root_id is specified by the 
primary key invoice_number in the application table sal- 
es_tab, all side tables have the column invoice_number of 
the same type. Also, the value of the mvoice jaumber of each 
row in the salestab will be inserted into the side tables. 

[0284] Since the default_view parameter is specified when 
enabling the XML column order, a default view sales_or- 
der_view is created by the XML System. It joins the above 
five tables by the following statement: 

[0285] CREATE VIEW sales_order_view(invoice- 
_number,sales_person,order, order_key,customer,part- 
_key,price,date) 

[0286] AS 

[0287] SELECT sales_tab.invoice_number, salesjab- 
^ales_person, sales_tab.order, order_tab.order_key, 
order_tab.customer, part_tab.part_key, price_tab.price, 
ship_tab.date) 

[0288] FROM salesjab, order_Ub, part_tab, pricetab, 
ship_tab 

[0289] WHERE sales_tab.invoice_number-order_tab- 
,invoice_number 

[0290] AND sales_tab.invoice_number-part_Ub.in- 
voice_number 

[0291] AND sales_tab. invoice_number-price_tab- 
.invoice_number 

[0292] AND sales_tab.invoice_number-ship_tab.in- 
voice_number. 

[0293] Because the tablespace in the enable column com- 
mand was not enabled, the default tablespace is used to 
create side tables. If the tablespace is specified and it does 
exist in the database, then the side tables will be created in 
the specified side tables. 

[0294] D.8 Inserting XML Documents 

[0295] For XML columns, an entire XML document is 
always stored as the column data. The insertion can be 
achieved in the following ways: 

[0296] Using the default cast function: 

[0297] For each UDT, there is a default cast function 
to convert the SQL base type to the UDT. The 
following cast functions can be used in a VALUES 
clause. 





Input 






Default UDT 


Parameter 


Return 




Cast Function 


Type 


Type 


Description 


db2xmlJCMLVbrcharO 


varchar 


XMLMirchar 


Input from mem- 
ory buffer of 
varchar 


db2xml.XMLCLOB0 


ciob 


XMLCLOB 


Input from mem- 
ory buffer of ciob 


db2xmlJCMLDBCLOB0 dbclob 


XMLDBCLOB Input from mem- 








ory buffer of 








dbclob 


db2xml.XMLFilc() 


varchar 


XMLFile 


Only store file 
name 


db2xmI.XMLURL0 


data link 


XMLURL 


data type 
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[0298] The following SQL statement inserts the 
casted varchar type in the host variable xml_buff into 
the XMLVarchar. 

[0299] INSERT INTO salesjab 

[0300] VALUESC 123456% 'Sriram Srinivasan', 
db2xmLXMLVarchar(:xml_buff)) 

[0301] Using the Storage UDF: 

[0302] For each XML System UDT, there is a storage 
(or import) UDF to import data from a resource other 
than its base type. For example, if to import an XML 
document in a tile to the database as a XMLVarchar, 
then the function XML VarcharFromFile( ) is used. 

[0303] In the example below, a record is inserted into 
the sales_tab table. The function XMLVarcharFrom- 
File( ) imports the XML document from a file into 
the database and stores it as a XMLVarchar. 

[0304] EXEC SQL INSERT INTO salesjab 
VALUES(' 123456', 'Sriram Srinivasan' XMLVarchar- 
FromFile(*/home/ul/xml/order.xmr)) 

[0305] The above example imports the XML object from 
the file a /homeAil/xml/order.xmr to the column order in the 
table salesjab. 

[0306] D.9 Retrieving XML Documents 

[0307] The XML table is ready to use when the XML 
column is enabled. Retrieving an XML column directly 
returns the UDT as the column type. A user can always use 
the default cast function provided by The database for 
distinct types to convert a UDT to an SQL base type, then 
operate on it. In addition to that, a user can also use 
overloaded UDF Content( ) to retrieve document content 
from a file or URL to a memory buffer. 

[0308] Using the default cast function: 

[0309] The following cast functions, which are auto- 
matically created by the database for the XML UDT, 
may be used in a SELECT statement 



Input 

Default Cast Parameter Return 

Function Type Type Description 

db2xmLvaich&rQ XMLVarchar varchar XML document in variable 

length of char 

db2xml.clob0 XMLCLOB dob XML document in CLOB 
db2xml.dbclob() XMLDBCIjOB dbclob XML in double byte CLOB 
db2xm l.varcharQ XMLFtle varchar XML filename in variable 

length of char 

db2xml.datalink0 XMLURL datalink URL of XML document 



[0310] The following SQL statement shows how to 
use the default cast function in a simple query. 

[0311] EXEC SQL SELECT 

db2xml.varchar(order)from salesjab 

[0312] Using the contents UDF: 

[0313] Suppose XML documents are stored as XML- 
File or XMLURL, to operate on these XML docu- 



ments in memory, the UDF contents, which takes 
XMLFile or XMLURL as input and returns a varchar 
or CLOB, is used. 

[0314] In the example below, a small sqc program 
segment illustrates how an XML document is 
retrieved from a file to memory. This example 
assumes that the column order is of XMLFile type. 



EXEC SQL BEGIN DECLARE SECTION; 

varchar(3k) xmLbuff; 
EXEC SQL END DECLARE SECTION; 
EXEC SQL CONNECT TO mydb 
EXEC SQL DECLARE cl CURSOR FOR 
SELECT Contcnt(ordcT) from sales.tab 

WHERE salesperson - 'Sriram Srinivasan' 
EXEC SQL OPEN cl; 
do{ 

EXEC SQL FETCH cl INTO rxml buff; 
if (SQLCODE !- 0) { 

break; 

} 

else { 

/• do whatever is needed to do with the XML doc in buffer */ 
} 

} 

EXEC SQL CLOSE cl; 



[0315] D.10 Updating XML Documents 

[0316] With the XML System, an entire XML document 
can be updated by replacing the XML column data. The 
XML System provides two techniques for update: 

[0317] Using cast Unctions or storage UDFs in the set 
clause of the SQL update statement: 

[0318] In this case, a cast function or a UDF is used 
in the Set clause. Here is an example: 

[0319] UPDATE salesjab 

[0320] set order-XMLVarcharFromFile^ome/uV 
xml/order2.xml f ) 

[0321] WHERE sales j>erson-'Sriram Srinivasan' 

[0322] Using the Update( ) UDF: 

[0323] The XML System provides a UDF Updaie( ) 
which allows a user to specify a location path and the 
value of the element or attribute represented by the 
location path to be replaced. In this case, a user does 
not need to retrieve the XML document and use an 
editor to change the content. The XML System will 
do it automatically. 

[0324] Here is an example of using the UDF Update( 
). In this example, the content of "/Order/Customer" 
is updated to NewMart". 

[0325] UPDATE salesjab set order«Update(order,7 
Order/Customer YNewMart') 

[0326] WHERE sales _person=' Sriram Srinivasan' 

[0327] For an XML Column, the XML System will update 
side tables of extracted data when the XML column is 
updated. However, a user should not update these side tables 
directly without updating original XML documents stored in 
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the XML column by changing the corresponding XML 
element or attribute value. Otherwise, there may be data 
inconsistency problems. 

[0328] D.ll Retrieving XML Element Contents and 
Attribute Values 

[0329] For XML columns, the XML System provides a 
UDF to extract element or attribute values from entire XML 
documents. The retrieval is performed on an XML docu- 
ment. It is a single document search. The XML System 
provides extracting UDFs to retrieve XML elements or 
attributes in the SQL select clause. This is very useful after 
search filtering on a collection of XML documents to further 
obtain desired elements or attributes. 

[0330] Suppose there are more than 1000 XML documents 
stored in the column order in the table sales_tab. To find all 
customers who have ordered items which have the Extend- 
edPrice greater than $2500.00, the following SQL statement 
with the extracting UDF in the select clause can be used: 

[0331] SELECT extractVarchar(Order,70rder/Cus- 
tomer') from sales_order_view 

[0332] WHERE price>2500.00 

[0333] where the UDF extract Varchar( ) takes the order as 
the input, and the location path "/Order/Customer" as the 
select identifier, and returns the names of the customer. Note, 
in this statement, only the orders with ExtendedPrice greater 
than $2500, say maybe 11 such orders, will be the input to 
the extracting function. The WHERE clause did the filtering 
on the collection of 1000 XML documents already. Again, 
the sales_order_view is the default view to join the appli- 
cation table sales_tab and all its side tables, where price is 
the part tab.price, representing the "/Order/Part/Extended- 
Price". 

[0334] D.12 Searching an XML Document 

[0335] The above sections have described how the XML 
System may be used as a document repository for storage 
and retrieval, as well as for element or attribute selection. 
Here, searching using indices created on side table columns, 
which contain XML element contents or attribute values 
extracted from XML documents, is illustrated Since the data 
type of an element or attribute can be specified, searches can 
be performed on SQL general data types and range searches 
can be performed. 

[0336] D.13 Search from Join View 

[0337] If desired and specified when an XML column is 
enabled, the XML System provides a default read-only view 
which joins the application table with all created side tables 
through the same unique identifier. With the default view, or 
any view created by the application, a user can search XML 
documents by a query on the side tables. 

[0338] The above examples have referenced an applica- 
tion table salcs^tab and side tables order_tab, part_tab and 
ship_tab. The name of a default view sales_order__view is 
specified at the enabling column time. XML System had 
created a default view sales_order_view which joins these 
tables by the statement shown in the previous section. 

[0339] The following example SQL statement will return 
the sales_persons of the sales_tab who have line item orders 
stored in the column order where the ExtendedPrice is 
greater than $2500.00. 



[0340] SELECT salesperson FROM sales_order_view 

[0341] WHERE price>2500.00 

[0342] The advantage of a query on the join view is that 
it provides a virtual single view of the application table 
and side tables. However, when more side tables are 
created, the more expensive the query will be. There- 
fore, it is only recommended when the total number of 
side table columns is small. An application can create 
a desired view by joining important side table columns 
for optimization. Note that the root_id, which can be 
the specified primary key in the application table or the 
DXXROOTJD created by the XML System, provides 
the way to join tables. 

[0343] D.14 Direct Query on Side Tables 

[0344] Since the DAD is specified by the application, the 
side tables created by the XML System are known to the 
application programmer. For better performance, an appli- 
cation can do query or sub-query on side tables directly. The 
following example shows how to do so for the same query 
stated above: 

[0345] SELECT salesperson from salesjab 

[0346] WHERE mvoicejromber in 

[0347] (SELECT invoice_number from part_tab 
WHERE price>2500.00) 

[0348] Mote that the invoice_number is the primary key in 
the application table sales_tab. The advantage of direct 
query with sub-query is better performance. When side 
tables have parent-children relationships, direct query with 
sub-query often make more sense. 

[0349] D.15 Query Using UDF 

[0350] In one embodiment, the side tables are created by 
the DAD, and indices are created for columns in the side 
tables. Therefore, the search will be fast with indexing. 

[0351] In another embodiment, it is not required that a user 
create side tables or indices on columns of side tables. The 
application still can use the extracting UDFs to do the query. 
Since each extracting UDF will do the source scan, it is very 
expensive. It should be used when other restrictions are 
applied to the WHERE clause so that the source scan is 
performed to a limited number of XML documents. 

[0352] Here is an example: 

[0353] SELECT sales_person from salesjab 

[0354] WHERE extractVarchar(order,70rder/Cus- 
tomer') like ^NewMart^fc* AND invoice_num- 
ber>100 

[0355] D.16 Search on an Element or Attribute with Mul- 
tiple Occurrences 

[0356] In XML documents, one element name type may 
occur multiple times. Since attributes belong to elements, 
the same location path of an attribute may often refer to 
multiple values. The term "multiple occurreDce" will be 
used to specify this case. 

[0357] In the DAD, a user can specify whether the location 
path will have multiple occurrence. In the above DAD 
example, the "/Order/Part/price" has multiple occurrence, 
and the side table price_tab was crated for it. It is possible 
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to have multiple rows in the part_tab table containing the 
same invoice_oumber. Therefore, a user should only select 
the distinct values. The following provides an example of 
how to do query for this case: 

[0358] SELECT sales_person from sales tab 

[0359] WHERE invoice_number in 

[0360] (SELECT DISTINCT invoice jromber 
from price Jab WHERE price>2500.00) 

[0361] On the other hand, since XML System provides 
additional column DXX_SEQNO in the pricejab, a user 
can select a price and pair it with the corresponding Ship- 
Date. The following is an example: 

[0362] SELECT price tab.price, shipjab.date from 
pricejab, ship Jab 

[0363] WHERE priceJab.invoice_number-«ship_tab- 
.invoice number AND price tab.DXX_SEQNO- 
shipjabTDXXSEQNO 

[0364] A user can also select the price ordered by the 
sequence number, as illustrated in the following example: 

[0365] SELECT price price Jab ORDER by DXX SE- 
QNO 

[0366] D.17 Structural-text Search 

[0367] In one embodiment of the invention, the structural- 
text or full text search is performed after enabling XML 
columns with Text Extender, a product from International 
Business Machines, Corporation. 

[0368] In the examples discussed herein, to perform struc- 
tural-text on the column order, a user can enable the column 
with the Text Extender, by specifying a text handle name, 
say "order! Iandle". Then with the Text Extender's section 
search support, the XML document with the word "XYZ" in 
the section "/Order/Customer" can be found. The following 
example shows how: 

[0369] SELECT order FROM sales Jab 

[0370] WHERE Contains(orderHandle, 'model Order 
section(/Order/Customer) "\XYZ\"')-1 

[0371] Where Order is the model name and Order/Cus- 
tomer is the section name. 

[0372] D.18 Deleting XML Documents 

[0373] Deleting a row from an XML table is done with a 
SQL DELETE statement. A user can use the search tech- 
nique discussed above to specify the WHERE clause. 

[0374] The following is a simple example: 

[0375] DELETE from sales Jab 

[0376] WHERE invoice jiumber in 

[0377] (SELECT invoice_number from part Jab 
WHERE price>2500.00) 

[0378] D.19 Disable Columns 

[0379] The disable j»lumn option disables the XML 
enabled column. The following is the syntax for disabling a 
column: 

[0380] dxxadm disable_column db_name tab_name 
column name 



[0381] The following are the arguments for disable_col- 
umn: 

[0382] db_name: the database name, 

[0383] tab_name: the table name in which the XML 
column resides. 

[0384] column_name: the name of XML column. 

[0385] The following actions are performed by disable- 
_column: 

[0386] Delete the entry from the XMLJJSAGE 
table. 

[0387] Decrement the USAGE_COUNT in the 
DTDREF table. 

[0388] Drop all triggers created with this column. 

[0389] Drop side table associated with this column. 

[0390] In one embodiment, a user must disable an XML 
column before dropping an XML table. If an XML table is 
dropped, but its XML column is not disabled, then all side 
tables created by the XML System will not be dropped. This 
may cause problems for the XML system to keep track of the 
number of enabled XML columns. 

[0391] D.20 Detailed Techniques 

[0392] The server code is the core of XML System. It has 
several major components, and each one performs a unique 
role in the product. 

[0393] The admin stored procedures are used to "xmlaUy" 
enable and disable the database, columns and indices. For 
performance and simplicy, these stored procedures were 
written in the embedded SQL. 

[0394] The XML System provides a number of functions 
in the server code. The functions are: dxxEnableDB, 
dxxDisableDB( ), dxxEnableColumn( ), dxxDisableCol- 
umn( ), and dxxEnableCollection( ). 

[0395] The dxxEnableDB stored procedure enables a data- 
base for XML document access. It uses the DDL statements 
to create XML System UDTs, a set of external UDFs, a set 
of internal UDFs, the DTD reference table, and the 
XMLJJSAGE table. The implementation of these UDFs are 
in the UDFs component. 

[0396] The dxxDisableDB( ) stored procedure drops 
everything created by the dxxEnabeDB( ). It does error 
checking on DTDREF and XMLJJSAGE tables. 

[0397] The dxxEnableColumn( ) stored procedure enables 
an XML column of the XML System UDT. It parses the 
input DAD, create side tables, and triggers according to the 
DAD. It also updates the XMLJJSAGE table. 

[0398] The dxxDisableColumn( ) stored procedure dis- 
ables an XML column. It deletes all side tables created by 
the XML System and updates the XMLJJSAGE table. 

[0399] The dxxEnableCollection( ) stored procedure 
enables an XML collection. It inserts a new row in the 
XMLJJSAGE table and stores the input DAD there. It 
checks or creates collection tables according to the DAD. 

[0400] The design description comprises program func- 
tions that implement the stored procedures in the source. 
These are listed below: 
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[0401] dxxdb-sqc 

[0402] 1. dxxEnableDB( ) 



function name 


caller 


input 


output 


dxxEaablcDBO 


mainQ in 


dbName 


cnCodc, 


(cnablc_db) 






errMsg 



[0403] The following is a Functional Description: 

[0404] 1. Initialize input and output parameters form 
sqlda, and sqlca. 

[0405] 2. Loop through enable statement array to 
execute each DDL. 

[0406] 3. Check whether database is DBCS enabled. 

[0407] 4. If DBCS enabled, loop to execute each 
DBCS enable DDL statement 

[0408] 5. error check. 

[0409] 6. Set output parameter. 

[0410] 2. dxxDisableDB( ) 



function came 


caller 


input 


output 


dbocDiaableDBO 


tnainO in 


dbName 


enCbde, 


(disable_db) 


dxxadm.sqc 




en Msg 



[0411] The following is a Functional Description: 

[0412] 1) Loop through disable statement array to 
execute each DDL. 

[0413] 2) Check whether database is DBCS enabled. 

[0414] 3) If DBCS enabled, loop to execute each 
DBCS disable DDL statement. 

[0415] 4) Error initialize input and output parameters 
form sqlda, and sqlca check. 

[0416] 5) Set output parameter. 

[0417] 3. IsDBCS_DB( ) 



function name 


caller 


input 


output 


isDBCS_DB() 


dxxEnableDBO, 


dbName 


TRUE or FALSE 




dzzDisableDBQ 







[0418] The following is a Functional Description: 

[0419] Check whether database is DBCS enabled, if so, 
return TRUE, otherwise return FALSE. 

[0420] dxxcol^qc 

[0421] 1. dxxEnableColumn( ) 



function name caller input output 

dxxEnabteColumn mainQ in 2. dbName, 1. crrCode 

dxxndWsqc 3. tabName, Z errMsg 

4. colName, 

5. dadBuf, 

6. tablespace, 

7. de&ultView; 

8. rootID, 



[0422] The following is a Functional Description: 

[0423] 1. Initialize host variables. 

[0424] 2. Initialize output parameters. 

[0425] 3. Call getParameter( ) to get input param- 
eters, 

[0426] Call check_table( ) to check table name, 

[0427] Call check_column( ) to check input col- 
umn name, and 

[0428] Check rootID. 

[0429] 4. Initialize XML4C parser. 

[0430] 5. Parse DAD. 

[0431] 6. Call dad_popu( ) to populate DAD into 
internal data structure. 

[0432] 7. Error checking on DAD: 

[0433] access_mode, 

[0434] DTDID 

[0435] 8. Get colno of this column from syscat, used 
as suffix of triggers. 

[0436] 9. Call createSideTables( ) to create side 
tables. 

[0437] 10. Create triggers on user tables: 

[0438] 1. rootidTrigger_BIT, 

[0439] 2. insertTrigger_ArT, 

[0440] 3. deleteTrigger_ADT, 

[0441] 4. updateTrigger_AUT, 

[0*42] 5. validateTrigger_VIT, 

[0443] 6. validateTriggcr_VUT 
[0444] 11. Create default view. 
[0445] 12. Insert a row into XML_USAGE table. 
[0446] 13. Update DTD_REF table. 
[0447] 14. Set output error message. 
[0448] 15. Error check on commit. 
[0449] 16. Free DAD structure. 
[0450] 2. dxxDisableColumn( ) 
[0451] function name caller input output 
[0452] dxxDisableColumn main( ) in dxxadm.sqc 
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[0453] 1. dbName, 

[0454] 2. tabName, 

[0455] 3. colName, 

[0456] 1. errCode 

[0457] 2. errMsg 

[0458] The following is a Functional Description: 

[0459] 1. Initialize host variables. 

[0460] 2. Initialize output parameters. 

[0461] 3. Call getparameter( ) to get input param- 
eters. 

[0462] 4. Check input parameters: 

[0463] 1. call check_table( ) to check table name, 
and 

[0464] 2. call check_column( ) to check input 
column name. 

[0465] 5. Get cohiwnn name, DAD, DTDID, default- 
View, Trigger suffix from XML_USAGE table. 

[0466] 6. Check whether column is XML enabled and 
trigger suffix exists. 

[0467] 7. Drop default view. 

[0468] 8. Parse DAD and populate DAD data struc- 
ture. 

[0469] 9. Using DAD structure to delete all side 
tables. 

[0470] 10. Call createSideTables( ) to create side 
tables. 

[0471] 11. Drop triggers on user tables: 

[0472] 1. reset DXXROOTJD, drop rootidTrig- 
ger_BIT, 

[0473] 2. insertTrigger_AIT, 

[0474] 3. deleteTriggerADT, 

[0475] 4. updateTrigger_AUT, 

[0476] 5. validateTrigger_VIT, and 

[0477] 6. validateTrigger_VUT. 
[0478] 12. Update DTD_REF table. 
[0479] 13. Delete the row into XML_USAGE table. 
[0480] 14. Set output error message. 
[0481] 15. Error check on commit. 
[0482] 16. Free DAD structure. 
[0483] 3. check_uble<) 



[0484] The following is a Functional Description: 

[0485] Check whether the table exists in the database by 
looking at the syscat.cohimos. 

[0486] 4. cbeck_column( ) 



function name 


caller 


input 


output 


check_column 


dzz£nableColumn(X 


1. table name 


, 1. errCode 




dxxDisableColumnQ 


2. column 
name 


2. errMsg 



[0487] The following is a Functional Description: 

[0488] Check whether the column exists in the right table 
by looking at the syscatcolumns. 

[0489] 5. getParameter( ) 

[0490] function name caller input output 

[0491] getParameter dxxEnableCohimn( ), 

[0492] dxxDisableCblumn( ) 

[0493] 1. in_sqfvar 

[0494] 1. data, 

[0495] 2. errCode 

[0496] 3. errMsg 

[0497] The following is a Functional Description: 

[0498] Extract parameter data from in_sqlvar, according 
to SQLTYPE. 



[0499] 6. createSideTable( ) 


function name caller 




input 


output 


creaicS Ldc dxxEnableColumn() 


1. 


pDAD, 


1. errCode 


TablcO 


2. 


rootid, 


2. errMsg 




3. 


rootid 








definition 






4. 


tablespacc, 





[0500] The following is a Functional Description: 

[0501] This routine creates all side tables specified in the 
DAD. It takes the pDAD (pointer to DAD) data structure, 
looping the list of side tables, and generates the "CREATE 
TABLE" statement. 

[0502] 1. If the rootid is not specified, 

[0503] then add DXXROOTJD as a not-null col- 
umn, 



function name caller input output 

[0504] else add the primary key in user table as a 

chcck_table dxxEnableColumn0, 1. dbName, 1. errCode not-null column 

dxxDisablcColumnO 2. errMsg 

[0505] 2. If the tablespacc is specified, add "IN" 

tablespace to the CREATE TABLE statement. 
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[0506] 7. rootidTrigger_BIT 



function name caller 


input 


output 


rootidTrigger_BIT dbotEnableColumnQ 


1. pDAD 


1. cnCode 






2. crrMsg 



[0507] The following is a Functional Description: 

[0508] This routine creates the Before Insert Trigger (BIT) 
to add the value of DXXROOT ID, which is generated from 
the generate_unique( ) function. 

[0509] 1. Check syscat.triggers to see whether the 
BIT exists or not, if exist, return error. 

[0510] 2. Execute the create trigger statement as: 

[0511] CREATE TRIGGER userJable.BIT 
BEFORE INSERT ON userjable 

[0512] REFERENCING NEW AS NEW 

[0513] FOR EACH ROW MODE DB2SQL 

[0514] WHEN (xmlcolumn IS NOT NULL) 

[0515] BEGIN ATOMIC SET 

[0516] NEWROW.DXXROOT_ID-generate_unique( ) 
END 

[0517] where userjable and xmlcolumn are taken 
from pDAD. 

[0518] 8. insertTrigger_AIT 



function name caller 


input 


output 


inse rtTriggc r^ATT dxxEnablc 


1. pDAO 


1. enCode, 


ColumnQ, 


2, trigger suffix, 


2. errMsg 




3. rootid, 





[0519] The following is a Functional Description: 

[0520] This routine creates the After Insert Trigger (ATT) 
to populate the side tables after a row is inserted into the user 
table with an XML column. 

[0521] Loop through pDAD->s_table, for each sjable, 
pDADst do: 

[0522] For each column in the pDADst do: 

[0523] 1. If (pDADst- >col is not multiple occurred) 

[0524] then set the triggerstmt to: 

[0525] INSERT INTO side Jab VALUES 

[0526] (NEWROWrootid, 

db2xml.extractDataType(xmIcolumn,path)) 

[0527] else set the trigger stmt to: 

[0528] INSERT INTO sidejab 

[0529] SELECT NEWROWrootid, 
db2xml.seqno( ), 

[0530] RETURNED DataType FROM 



[0531] TABLE(db2xml.exttactDaUTypes 
(xmlcohimn,path))x 

[0532] 2. Execute the statement: 

[0533] "CREATE TRIGGER userJabAITtrigger- 
suffix 

[0534] AFTER INSERT ON userjab 

[0535] REFERENCING NEW AS NEWROW 

[0536] FOR EACH ROW MODE DB2SQL 

[0537] WHEN (xmlcolumn IS NOT NULL) 

[0538] BEGIN ATOMIC 

[0539] trigger_stmt; 

[0540] END" 

[0541] where userjab, sidejab, xmlcolumn and 
path are getting from pDADst, and DataType is 
getting from the call of mapType(pDADst->col- 
>type) 

[0542] Note: the db2xml.seqno( ) is a UDF to generate a 
sequence number of multiple occurrence. 

[0543] 9. deleteTrigger_ADT 



function name 


caller 


input 


output 


delcteTrigger_ADT 


dxxEnable- 


1. pDAD 


1. enCode 




ColumnQ 


2. trigger^suffix, 


2. errMsg 






3. rootid, 





[0544] The following is a Functional Description: 

[0545] This routine creates the After Delete Trigger (ADT) 
to delete rows in side tables after a row is deleted from the 
user table with an XML column. 

[0546] Loop through pDAD->s_table, for each sjable, 
pDADst do: 

[0547] execute the statement: 

[0548] "CREATE TRIGGER userJab.ADTtrig- 
ger_suffix 

[0549] AFTER DELETE ON userjab 

[0550] REFERENCING OLD AS OLDROW 

[0551] FOR EACH ROW MODE DB2SQL 

[0552] BEGIN ATOMIC 

[0553] DELETE FROM sidejab WHERE 

[0554] OLDROW.rootid-sidetab.rootid 

[0555] END" 

[0556] where userjab, sidejab, xmlcolumn and 
path are getting from pDADst 
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[0557] 10. updateTrigger AUT 



function name 


caller 


input 


output 


updateTrigger_AUT 


dxxEnablc- 


1. pDAD, 


1. data, 




CohimnQ 


2. trigger suffix, 


2. crrCode 






3. rootid 


3. errMsg 



[0658] The following is a Functional Description: 

[0559] This routine creates the After Update Trigger 
(AUT) to update rows in side tables after a row is updated 
in the user table with an XML column. 

[0660] Loop through pDAD->s_table, for each sjable, 
pDADst do: 

[0661] For each column in the pDADst do: 

[0562] If (pDADst->col is not multiple occurred) 

[0563] then set the trigger_stmt to: 

[0564] UPDATE side Jab SET setstmt 
WHERE NEWROW.rootid-sideJab.rootid 

[0565] else set the trigger_stmt to: 

[0566] DELETE FROM sidejab WHERE 
NEWROWrootid sidejab.rootid; 

[0567] INSERT INTO sidejab 

[0568] SELECT NEWROWrootid, 
db2xmLseqno( ), 

[0569] RETURNED Dataiype FROM 

[0570] 

TABLE(db2xml.extractDataTypes(xmlcolumn, 
path))x 

[0571] Execute the sUtement: 

[0572] "CREATE TRIGGER userJab-AUTtrig- 
ger_suffix 

[0573] AFTER UPDATE ON userjab 

[0574] REFERENCING NEW AS NEWROW 

[0575] FOR EACH ROW MODE DB2SQL 

[0576] WHEN (xmlcolumn IS NOT NULL) 

[0577] BEGIN ATOMIC 

[0578] trigger_stmt, 

[0579] END" 

[0580] where userjab, sidejab, xmlcolumn and 
path are getting from pDADst, and DataType are 
getting from the call of mapiype(pDADst->col- 
>type) 

[0581] 11. validation Trigger_VBIT 



function name caller input output 

validationTriggcr_VBrr dxxEnable- 1. pDAD, 1. enCode, 

ColumnQ 2. trigger_suffix 2. errMsg 



[0582] The following is a Functional Description: 

[0583] This routine create a Validation Before Insert Trig- 
ger (VBIT) to validate an input XML document before 
inserting it into a user table. Due to the use of XML4C 
parser, it retrieves the DTD from did _ref table and puts it in 
an external file, then calls the UDF db2xml.validate in the 
trigger. 

[0584] It executes the following statement: 

[0585] "CREATE TRIGGER userJab.VBITtrigger- 
jsuflix 

[0586] BEFORE INSERT ON userjab 

[0587] REFERENCING NEW AS NEWROW 

[0588] FOR EACH ROW MODE DB2SQL 

[0589] WHEN validation !-0 

[0590] SIGNAL SQLSTATE 'DXX.SQLSTATE- 
JNVAIJD J>OC (DXX_0000E) 

[0591] where validation- 

[0592] SELECT 

db2xml.validate(NEWROWxmlcolumn, db2xml. 
content(content, tmpfileName), pDAD->dtdid) 

[0593] FROM db2xml.dtdjef WHERE dtdid- 
pDAD->dtdid) 

[0594] where user_tab, xmlcolumn are getting from 
pDAD, tmpefileName is set by this routine, and the 
"content" is the column name in dtdjef for DTD. 
db2xml.content( ) is a UDF. 

[0595] 12. vahdauonTriggerJVBUT 



function name 


caller 


input 


output 


validatiDnTrigger_VBUT 


dxxEsable- 


1. pDAD, 


1. enCode 




ColumnO 


2. trigger_juffix 


2. errMsg 



[0596] The following is a Functional Description: 

[0597] This routine create a Validation Before Update 
Trigger (VBUT) to validate an input XML document before 
updating it in user table. Due to the use of XML4C parser, 
it retrieves the DTD from dtdjef table, puts it to an external 
file, then calls the UDF db2xml. validate in the trigger. 

[0598] It executes the following statement: 

[0599] "CREATE TRIGGER userjab. VBUTtrigger- 
_suffix 

[0600] BEFORE UPDATE ON userjab 
[0601] REFERENCING NEW AS NEWROW 



[0602] FOR EACH ROW MODE DB2SQL 
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[0603] WHEN validation !«0 

[0604] SIGNAL SQLSTATE 1 DXX_SQLSTATE- 
JNVALID_DOC' (DXX_0000E) 

[0605] where validation- 

[0606] SELECT 
db2xml.validate(NEWROW.xmloolumn, 
db2xml.content(contentt mpfileNamc), pDAD->dt- 
did) 

[0607] FROM db2xmLdtd_rcf WHERE dtdid- 
pDAD->dtdid) 

[0608] where userjab, xmlcohimn are getting from 
pDAD, tmpefileName is set by this routine, and the 
"content" is the column name in dtd_ref for 
DTD.db2xml.content< ) is a UDF. 

[0609] 13. createDefaultView 



function same 


caller 


input 


output 


createDefaultView 


dxxEnabl eColumoQ 


1. pDAD, 


1. enCode 






2. rootid, 


2. crrMsg 






3. tablename, 








4. default_view 





[0610] The following is a Functional Description: 

[0611] This routine creates a default view which joins the 
user table and XML column side tables together with the 
name specified as the input parameter default_view. The key 
here is to join by the rootid, which can be the 
DXXROOTJD or the primary key of user table. As the 
input to this routine, the rootid is used as the column name 
forjoin. 

[0612] 1. Declare a cursor on the statement: 

[0613] SELECT colname FROM syscat.column 
WHERE tabname-tablename execute the statement, 
open the cursor and tetch on the cursor, while (! end 
of fetch) { 

[0614] get column and append userjab .column to 
string userTableColumns append tablename to string 
alltablenames, 

[0615] 2. Looping through the pDAD structure, 

[0616] for each side table pDADst do: 

[0617] append side table name pDADst->stbname 
to alltablenames, 

[0618] append to string join_condition with "table- 
n am e. rootid ~pDADst->stb name. rootid"; 

[0619] loop through the pDADst->col list, 

[0620] for each column in the side table do: 

[0621] append the pDADst->stbmaine.p- 
DAdst->scolname to the string sideTableCol- 
umns; 



[0622] 3. execute the create view statement: 

[0623] CREATE VIEW default_view AS 

[0624] SELECT usertableColumns sideTableCol- 
umns FROM alltablenames WHERE join_condition 

[0625] D.21 Flow Diagrams 

[0626] FIG. 4 is a flow diagram illustrating steps per- 
formed by the XML System in creating and maintaining 
XML document data as column data. In Block 400, the XML 
System creates a table with an XML column having a XML 
column type. The table is created in response to a CREATE 
TABLE statement that specifies the XML column . In Block 
402, the XML System enables the XML column. Next, the 
XML System, in Block 404, creates side tables using a Data 
Access Definition for the XML column. In Block 406, the 
XML System creates triggers for Insert, Update, and Delete 
on the XML column, so that the side tables are populated 
when the main table is populated and the side tables are 
modified when the main table is modified. Thus, the main 
table and side tables are synchronized. In Block 408, when 
data is inserted into the main table, the XML System inserts 
data into the side tables. In Block 410, when the main table 
is modified (i.e., data is updated or deleted), the XML 
System modifies the side tables. 

[0627] FIGS. 5 and 6 illustrate key aspects of an embodi- 
ment of the invention. In particular, these figures illustrate 
enabling a column and disabling a column. 

[0628] FIG. 5 is a flow diagram of steps performed by the 
XML System to enable a column. In block 500, the XML 
System initializes all variables. In block 502, the XML 
System gets and checks input parameters. In block 504, the 
XML System calls a XML4C parser to parse a DAD. In 
block 506, the XML System determines whether the root id 
is input by an application. If not, the XML System continues 
to block 508, otherwise, the XML system continues to block 
510. In block 508, the XML System creates side tables with 
DXXROOTJD. In block 510, the XML System creates side 
tables with the user table's primary key as the rootjd. In 
block 512, the XML System creates the rootjd, insert, 
delete, and update triggers on user tables. 

[0629] In block 514, the XML System determines whether 
the DAD specifies validation. If so, the XML System 
continues to block 516, otherwise, the XML System con- 
tinues to block 518. In block 516, the XML System creates 
validation triggers. In block 518, the XML System deter- 
mines whether a default view is input by the application. If 
so, the XML System continues to block 520, otherwise, the 
XML System continues to block 522. In block 520, the XML 
System creates a default view. In block 522, the XML 
System inserts an entry into XML USAGE TABLE. In 
block 524, the XML System updateslhe DTD_REF. 

[0630] FIG. 6 is a flow diagram of steps performed by the 
XML System to disable a column. In block 600, the XML 
System initializes all variables. In block 602, the XML.Sys- 
tem gets and checks input parameters. In block 604, the 
XML System gets the DAD, DTDID, and default view from 
the XML_USAGE TABLE. In block 606, the XML System 
determines whether the default view is null. If si, the XML 
System continues to block 610, otherwise, the XML system 
continues to block 608. In block 608, the XML System drops 
the default view. 
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[0631] In block 610, the XML System parses the DAD to 
get side table names. In block 612, the XML System drops 
all side tables. Id block 614, the XML System drops the 
root_id. insert, delete, and update triggers on user tables. In 
block 616, the XML System determines whether the DAD 
specifies validation. If so, the XML System continues to 
block 618, otherwise, the XML System continues to block 
620. In block 616, the XML System creates validation 
triggers. In block 620, the XML System deletes the entry 
from the XML_USAGE TABLE. In block 622, the XML 
System updates the DTD REF table. 

E. Generating One or More XML Documents From 
a Single SQL Query 

[0632] In one embodiment of the invention, an XML 
System is provided that generates one or more XML docu- 
ments from a single SQL query. This technique is referred to 
as "SQL mapping". The XML System retrieves data in 
existing relational database tables and forms a set of one or 
more XML documents. Using the XML System, application 
programs can turn existing business data into one or more 
new XML documents to be interchanged from business to 
business via a network, such as the internet or an intranet. 

[0633] The XML System takes a single SQL query, along 
with a definition of the data model from which one or more 
XML documents are to be generated (i.e., a DAD), and 
forms one or more XML documents using the data in 
existing database tables which meet the query condition. 

[0634] The XML System is implemented by stored pro- 
cedures which can be called from the database client code. 
The stored procedures take a Data Access Definition (DAD), 
which consists of the SQL query, the Extensible Markup 
Language Path (XPath) data model based definition of the 
document structure to be generated, and a table name which 
will contain the generated one or more XML documents as 
its row data. The stored procedures use a heuristic technique 
to eliminate duplication from the SQL query. Additionally, 
the stored procedure identifies the relational hierarchy of the 
SQL query and maps the data obtained from the SQL query 
into elements and attributes of generated one or more XML 
documents. 

[0635] An XcoUecuon defines how to compose one or 
more XML documents from a collection of relational tables. 
An XML collection is a virtual name of a set of relational 
tables. Applications can enable an XML collection of any 
user tables. These user tables can be existing tables of legacy 
business data or ones newly created by the XML System. A 
user can access XML collection data through the stored 
procedures provided by the XML System. 

[0636] An XML collection is used to transform data 
between database tables and one or more XML documents. 
An XML collection achieves the goal of data interchange via 
XML. For applications that want to compose one or more 
XML documents from a set of relational tables, the XML 
System offers a technique to enable an XML collection 
through a Document Access Definition (DAD). In the Docu- 
ment Access Definition, applications can make a custom 
mapping between database column data in new or existing 



tables to XML elements or attributes. The access to an XML 
collection is by calling the XML System's stored procedures 
or directly querying the tables of the collection. 

[0637] E.1 Example 

[0638] The following discussion provides an example of 
generating one or more XML documents from a relational 
database using an SQL query and a simple DAD. In par- 
ticular, a relational database is illustrated. Then, an SQL 
query is illustrated that is used to retrieve data from the 
relational database. Next, the results of the SQL query are 
illustrated. Moreover, the Document Access Definition 
(DAD), which contains the SQL query is illustrated, along 
with a Document Type Definition (DTD). After this, one 
XML document that is generated to contain the data 
retrieved by the SQL query is illustrated. 

[0639] Relational Database: 







order.tab: 






order Joey 


customcr.m 


ime cnatomer.email 


customer phone 


1 


General Motor parts@gm.com 


800-GM-PAKTS 






part, tab 






partjtcy 


color 


qty price 


tax 


order.kcy 


156 


red 


17 17954.55 


0.02 


1 


68 


black 


36 34850.16 


0.06 


1 


128 


red 


28 38000.00 


0.07 


1 






ship Jab: 






date 


mode 


comment 




part.key 


1998-03-13 


TRUCK This is the first shipment to 


156 






service of GM. 






1999-01-16 


FEDEX 


This is the second shir. 


iment to 


156 






service of GM. 






1998-08-19 


BOAT 


This shipment is requested 


68 






by a call, from GM 










myrfc^-Wngi 






1998-08-19 


AIR 


This shipment is ordered by 


68 






an emaiL 






1998-12-30 


TRUCK 


NULL 




128 



[0640] The following is an SQL query. The SELECT term 
selects columns. The FROM term indicates the tables from 
which data is to be selected. The WHERE term indicates the 
conditions for selecting data. This SQL query is defined in 
a Document Access Definition, which is illustrated below. 

[0641] SELECT o.orderjcey, customer_name, cus- 
tomer^email, p.partjeey, color, qty, price, tax, 
ship_id, date, mode 

[0642] FROM order Jab o, partjab p, table(select 
substr(char(timestamp(generate_unique( )),16) as 
ship id, date, mode, part_key from ship_tab) 

[0643] WHERE order_key-l and p.price>20000 and 
p.orderkey-o.orderjcey and s.part_key-p.part_key 

[0644] The following is a table holding the results of 
executing the SQL query: 
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order, customer, customer. part. 

key name email key color qty price tax stupid date mode 

1 General parts@gm. 68 red 36 34850.16 0.06 4.58.825484 1998-08-19 BOAT 

Motor com 

1 General parte@gra. 68 red 36 34850 16 0.06 4.58.825537 1998-08-19 AIR 

Motor com 

1 General parts@gm. 128 red 28 38000.00 0.07 4 58.825589 1998-12-30 TRUCK 

Motor com 



[0645] The data in order_key, customer_oame, customer- 
_emaiL, part_key, qty, price, and tax are duplicated for each 
shipment. The data in order_key, customer_name, and cus- 
tomer_email are duplicated for each part. This issue is 
addressed by partitioning the columns into equivalence 
classes that reflect the semantics of the relational data: 
{order_key, custom cr_name, customer jemail}, {partake y, 
color, qty, price, tax}, and {ship_id, date, mode}. The XML 
System opens a new cursor only when it crosses a boundary 
between classes. 

[0646] A user can decide how structured XML documents 
are to be stored or created through a Document Access 
Definition(DAD). The DAD itself is an XML formatted 
document. The DAD associates XML documents to a data- 
base by defining an Xcollection. The SQLstmt in the DAD 
is an SQL query that specifies how columns of a table are to 
be mapped to XML elements and attributes. The columns in 
the SELECT clause are mapped to XML elements or 
attributes. They will be used to define the value of 
attribute_nodes or content of text_nodes. The FROM clause 
defines the tables containing the data, and the WHERE 
clause specifies the join and search conditions. 

[0647] Assuming the following structure of an XML docu- 
ment will be generated from the data selected by a SQL- 
_stmt, how to use an XML collection to specify the DAD 
will be illustrated below. 



<?xml version-" 1.0"?> 

<!DOCTYPE Order SYSTEM "E:\dxx\test\dtd\lilem.dtfT > 
<Order key-T> 

<Customer>Geaeral Motor </Customer> 



-continued 



<Part key-'*68 M > 

<Quantity>36</Quantity> 

<£xteDdedPrice>34850.16</ExtendedPrice> 

< f Ikx>0.06</rhx> 

<ShipmeDt> 

<ShipDate>1998-04-12</ShipDate> 
<SbipMode>BOAT</ShipMode> 
<Commen t>This shipment is requested by a 

call from GM rnarketing</Commcnt> 
</Shipmcat> 
<ShipmcDt> 

<ShipDate>1998-08-19</ShipDate> 
<ShipMode>AlR</ShipMode> 
<Commeot>This shipment is ordered by an 

email </Comment> 
</Shipment> 
</Part— "128"> 

<Quantity>28</Quantity> 

<FJrteMlcdPrice>38000.00<^Exte«iedPrice> 

<Oax>0.07</Tax> 

<Shipment> 

<ShipDate>1998-12-30</ShipDate> 
<ShipMode>TRUCK</ShipMode> 
<Comment>This is the first shipment to 

service of GM</Comment> 
</ShipmcnL> 
</Part> 
</Onkr> 



[0648] The following sample DAD shows how to define 
the mapping from relational tables to one or more XML 
documents using SQL mapping. The following sample DAD 
shows how to specify an SQL query to compose a set of one 
or more XML documents from data in three relational tables. 



litcm.DAD2.dad 

<?xml veraion-"1.0"?> 

<!DOCTY?E Order SYSTEM " E:\dtd\dxxdad_dtd" > 
<DAD> 

<dUtid>E:\dtd\lincItenLdtd <tftdid> 
<validation>YES </validatton> 
<Xcollection> 
<SOL.stmt> 

SELECT o.orderkey, customer, p.partjcey, qty, price, tax, ship id, date, mode comment 
FROM order.tab o, part.tab p, 

table(select substr(cbar( times tamp (generate .unique0)),l 6) as 
ship.id,date^node^omments from ship.tab) as s 
WHERE p.price > 2500.00 and s.datc > -1996-06-01" AND 

p.orderkey - o.order key and s.part key - p. part key 
</SQL.stmt> 
<objids> 

<column name» - orderJcey*7> 
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-continued 



ccolumn oame-"part .key" /> 
ccolumn name— "ship jd"/> 
</objids> 

<prolog>?xml version- - 1.0"?c/prolog> 

<doctypc>!DOCTYTE Order SYSTEM w E:\dld\lincIlcm.dLd"</doctypc> 

<TOOt.00dc> 

celement_node namc-'*Order w > 
<8ttribute.node name»**Key"> 

ccolumn name-"ordcrJtcy"/> 
</attribute_node> 
celemcnt.node aame-" l Customcr**> 
<text.node> 

ccolumn name— customer" /> 
</text.oode> 
</element.node> 
<elemcnt_node namc-"Part w > 
cailnbute.aode name— "Key" > 

ccolumn name-"partjcey7> 
</aUribute.nodc> 

celemeot .node name-**Quanlity" > 

<text.node> 

ccolumn name- - qty"/> 

c/text_node> 
</dementjiode> 

celementjiode name-*^itendedPrice"> 
ctc3d..node> 

ccolumn namc-"pricc"/> 
</lcxt.node> 
c/clementjiodo 
<xlement_node name-Tax"> 
<text.node> 

ccolumn name—" tax"/> 
</tezt.nodc> 
</ciementjiode> 
celement.nodc name- "Shipments 

eclcment jiodc name— "ShipDatc'*> 
<text.nodc> 

ccolumn name-"datc7> 
cAext_node> 
</elcmcnLnode> 

celementjiode name-"ShipMode'*> 

ctext.node> 
ccolumn namc-*'mode"/> 

c/tcxt node> 
c/element.node> 

celement.node name— "Co mmenlfl"> 
<terl_node> 

ccolumn Dame-"comment"/> 
•c/texLnode> 
</elcmcnt.node> 
</elemenl_nodc><I— end Shipment -> 
c/element.nodcx!— end Part — > 
c/elemcnt jiodcxi— end Order — > 
c/root.node> 
c7Xcollection> 
</DAD> 



[0649] The SQL query should be in a top-down order of 
the relational hierarchy. In the example, it is required to 
specify the selected columns in the order of 3 levels: order, 
part and shipment. Within each level, the objid must be the 
first column. If the order described is not preserved, the 
generated XML documents may not be correct. 

[0650] E.2 How to Use an XML Collection 

[0651] An XML collection is a set of relational tables 
which contain XML data. These tables can be new tables 
generated by the XML System or existing tables which have 
data to be used by the XML System to generate one or more 
XML documents. Stored procedures provided by the XML 
System serve as the access methods. Unlike the XML 



column, an XML collection does not have to be enabled. The 
enablement is based on the operations performed. 

[0652] A composition operation of an XML collection is to 
generate one or more XML documents from data existing in 
the collection tables. Therefore, for this operation, an XML 
collection does not need to be enabled, providing all tables 
already exist in the database. The DAD will be passed to 
stored procedures. The DAD can be overridden by other 
XML query parameters as the stored procedure input param- 
eters. This kind of parameter can be obtained from various 
sources (e.g., dynamically from the web). 

[0653] In the DAD preparation, first "Xcollection" is 
defined. An Xcollection can be defined for composition or 
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decomposition, in the way of either SQL mapping or RDB- 
_node mapping. In both cases, the following steps should 
apply: 

[0654] 1. Specify the DTDID. The DTDID must be 
the same as the system ID in the doctype. 

[0655] 2. Specify the validation. When using 
DTDID, the validation should be always be "YES". 

[0656] 3. Specify the prolog and doctype. For com- 
position, the same prolog and doctype are allowed. 

[0657] E.2.1 Enabling an XML Collection 

[0658] The purpose of enabling an XML Collection for 
decomposition is to parse a DAD, create new tables or check 
the mapping against existing tables. The DAD is stored into 
the XMLJJSAGE table when the XML Collection is 
enabled. 

[0659] When a user prefers to have the XML System 
create collection tables, the user should enable the XML 
collection. Additionally, the enablement depends on the 
stored procedure the user chooses to use. The stored proce- 
dure dxxInsertXML( ) will take XML Collection name as 
the input parameter. In order to use the stored procedure 
dxxInsertXML( ), the user must enable an XML collection 
before calling it. The user can call stored procedure dxx- 
ShredXML( ) without the enabling of an XML collection by 
passing a DAD. In the later caPse, all tables specified must 
exist in the database. 

[0660] E.2.1.1 Enabling XML Collection Option 

[0661] For composition, an XML collection is not required 
to be enabled. The assumption is that all collection tables 
already exist in the database. The stored procedure can take 
a DAD as an input parameter and generate XML documents 
based on the DAD. On the other hand, the composition is the 
opposite of the decomposition. For XML collections enabled 
during the decomposition process, the DAD is likely to be 
used to compose XML documents again. If the same DAD 
is used, then the collection can be enabled for both compo- 
sition and decomposition. 

[0662] An XML Collection can be enabled through the 
XML System administration GUI (graphical user interface) 
or using the dxxadm command with the Enable_collection 
option. The syntax of the option on a DB2 server is as 
follows: 

[0663] dxxadm enable_collection db_name collection- 
_name DAD_file [-t tablespace] 

[0664] where 

[0665] a. db__name: name of the database. 

[0666] b. collection_name: name of the XML collec- 
tion, which will be used as the parameter to the 
stored procedures. 

[0667] c. DAD_file: Data Access Definition (DAD) 
file. 

[0668] d. tablespace: Optional. The tablespace con- 
tains tables in the XML collection. If new tables need 
to be created for decomposition, then new tables will 
be created in the specified tablespace. 



[0669] The following is an example of enabling the XML 
collection called sales_order in database mydb with the 
DAD_file Litem_DAD3. dad. 

[0670] /home/ul>dxxadm enable_collection mydb sale- 
s_order Litem_ DAD3.dad 

[0671] DXXA009I XML System is enabling collection 
salesjorder. Please wait. 

[0672] DXXA010I XML System has successfully 
enabled the collection salesjorder. 

[0673] /home/ul> 

[0674] The enable_collection option mainly does the fol- 
lowing things to a database: 

[0675] e. Read the DAD_file, call XML parser to 
parse DAD, and save internal information for map- 
ping. 

[0676] f. Store internal information into the 
XMLJJSAGE table. 

[0677] The option is good for performance and is usually 
helpful to perform composition and decomposition using 
one DAD. 

[0678] E.2.1.2 Enable collection Option 

[0679] The enable_collection option enables an XML col- 
lection associated with an application table. The association 
between the application table and the side table specified by 
the DAD is through the root_id. 

[0680] Syntax 

[0681] dxxadm enable_collection db_name collection 
DAD_File [-t tablespacel-l loginj-p password] 

[0682] Argument 

[0683] g. db_name: the database name. 

[0684] h. collection: the name of an XML collection. 

[0685] i. DAD_File: the file containing the DAD. 

[0686] j. tablespace: Optional. The tablespace con- 
taining a user table specified in the DAD or side table 
created by the XML System. 

[0687] k. login: Optional. The user ID, which is only 
needed if the command is invoked from a DB2 
client. 

[0688] 1. password: Optional. The password, which is 
only needed if the command is invoked from a DB2 
client. 

[0689] The enablecollection option will enable an XML 
collection. The enablement process is to parse the DAD and 
prepare tables for XML collection access. It takes the 
database name, a name of the XML collection, a DAD_File 
and an optional tablespace. The XML collection will be 
enabled based on the DAD in the DAD_File. It checks 
whether the tables specified in the DAD exist. If the tables 
do not exist, the XML System will create the tables accord- 
ing to the specification in the DAD. The column name and 
data type is taken from the RDB_node of an attribute_node 
or text jiode. If the tables exist, the XML System will check 
whether the columns were specified with the right name and 
data types in the corresponding tables. If a mismatch is 
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found, an error will be returned. The tablespace is optional, 
but it is specified if the collection tables are to be created in 
a tablespace other than the default tablespace of the specified 
database. 

[0690] The enable_collection is required for decomposi- 
tion stored procedure dxxInsertXML( ), and its pairing 
dxxRetrieveXML( ), and the dxxUpdateXML( ). For stored 
procedure dxxGenXML( ) and dxxShredXML( ) which take 
a DAD as input, the enablement of an XML collection is not 
required. For the latter stored procedures, it is assumed that 
all tables specified in the DAD exist in the database already. 
If they don't exist, an error will be returned. The enable_c- 
ollection does have a pairing disablecollection option. But 
the operation of disable_collection is much simpler. It just 
deletes the collection from XML_USAGE table. 

[0691] FIG- 7 is a diagram illustrating code organization 
to compose XML documents. The dxxGenXML( ) stored 
procedure 700 and the dxxRe trie veXML( ) stored procedure 
702 each invoke the dxxComposeXM L( ) module 704. 
Decision block 706 determines whether RDBjxxle map- 
ping is to be performed. If RDBoode mapping is not to be 
performed, processing continues with SQL to XML module 
708 and the SQL to XML mapping is performed by the SQL 
to XML mapping technique 710. If RDBnode mapping is 
to be performed, processing continues with the dxxGenXM- 
LFrom RDB( ) module 712 and RDB_node mapping is 
performed by the RDBR technique. 

[0692] E3 Using SQL Mapping Scheme 

[0693] The mapping between composed XML documents 
and an XML collection is specified in the Xcollection of a 
DAD. The XML System adapts the notation used in XPath 
and uses a subset of it to define the XML document 
structure. In order to facilitate the mapping, the XML 
System introduces the element SQL_stmt to the Xcollection. 

[0694] The DAD defines the XML document tree structure 
using seven kinds of nodes defined by XPath: 

[0695] root_node 

[0696] element_node 

[0697] text_node 

[0698] attribute_node 

[0699] namespace_node 

[0700] processing_instruction_node 

[0701] comment_node 

[0702] The element SQL_stmt is designed to allow simple 
and direct mapping from relational data to one or more XML 
documents through a single SQL statement. It is useful for 
the composition when application programmers know 
exactly what data they want to select from a database and 
compose the one or more XML documents. The content of 
the SQL_stmt must be a valid SQL select statement The 
columns in the SELECT clause are mapped to XML ele- 
ments or attributes. They will be used to define the value of 
attribute_nodes or content of text_nodes. The FROM clause 
defines the tables containing the data, and the WHERE 
clause specifies the join and search conditions. 

[0703] In the definition of an Xcollection, for this embodi- 
ment of the invention, the following approach is used to 
define the SQL mapping: 



[0704] Provide direct SQL statement in one optional 
SQL_stmt, then specify the mapping between the 
columns in the SQL statement and text_nodes or 
attribute_oodes in the XML data model. In this case, 
the XML System will use the SQL statement to select 
the data for composition or insert data for decom- 
position. 

[0705] The text_node and attribute_node will have a one- 
to-one mapping to/from a column in a relational table. 
Therefore, each of them will have a column to define the 
mapping, where the column is needed for SQL mapping. It 
is possible that an element_node has no text_node but only 
child element_node(s). 

[0706] The SQL mapping is simple and powerful. For 
SQL mapping, a user may join all tables in one select 
statement to form a query. 

[0707] The SQL mapping requires a user to supply an 
SQL_stmt in a DAD. To simplify the demonstration, the 
following steps guide a user to define an 'Xcollection* for 
composition, using SQL mapping. The composed XML 
document orderjcml is in C5, the given DTD Lineltem.dtd 
is in C4, and Litem_DAD2.dad in E.l. 

[0708] Define only one SQL_stmt in the Xcollection. In 
the SQL_stmt, supply an SQL query which joins all tables 
of the XML collection and select desired columns from these 
tables. The predicate as the query condition should be 
specified in the WHERE clause. In the SELECT clause, the 
column name to be selected should be unique. In case of two 
columns of different tables having the same name, use the 
AS to make them different. This is because the column name 
specified in the select clause will be used to identify the 
value of an attribute_node or text_node. 

[0709] Note, in the examples of this section, three 
tables: order_tab, part_tab and ship_tab are joined 
through the order_key and part_key, and orderjcey, 
part jcey, price, qty, tax, date, mode and comment are 
selected. The query condition is price>2500.00 and 
date>"Jun. 1, 1996". 

[0710] Define the "ORDER BY" clause at the end of 
SQL_stmt. For each table, define a column or a unique 
identifier so that the one or more XML documents generated 
will be ordered by it. In the example, the primary key 
column is used for order_tab and part_tab. However, it is not 
necessary to use the primary key as long as it is unique and 
identifies a row object in the table. It can be generated by 
using the generate_unique( ) function or a user defined 
function (UDF) and does not need to be a real column name. 
However, "AS** followed by a conceptual column name in 
the SQL stmt must be used, first, then that name is used in 
the "ORDER BY** clause. Ship id is used in the example for 
shiptab in this way. 

[0711] Define the tree data structure of the XML docu- 
ments by specifying the root_node. The root_node should 
have only one child element_nodc, in the example, the 
element_node for "Order**. 

[0712] Specify column name of each attribute_node and 
text_node in the document tree structure. It must be speci- 
fied, and the column name must be in the select clause of the 
SQL_stmt. A user can also specify an optional condition to 
qualify the documents to be generated. 
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[0713] E.4 Detailed Techniques 
[0714] E.4.1 Levels 

[0715] la the relational data model, entities may have 
one-to-many relationships. For example, one order may 
have many parts, and one part may have many shipments. If 
these relationships are visualized in the form of a tree, order 
could be regarded as the root, which has parts as its children, 
and each part has shipments as its children. For this discus- 
sion, the different levels of this tree are called "relational 
levels." 

[0716] An XML document also has a tree structure that 
consists of elements at different levels in the tree. Unfortu- 
nately, these levels do not necessarily match the levels in the 
relational model. For example, although Customer and Part 
are at the same level in the XML tree (since both are 
immediate children of Order), an Order may have multiple 
Parts and it can have only one Customer. Therefore, the 
element Part needs to be generated in a way which is 
different from the way in which Customer is generated. To 
generate Parts, the XML System opens a new cursor and 
loops through parts. To generate Customer, the XML System 
retrieves data from the same descriptor (SQLDA) as Order, 
and no new cursor or loop is required. 

[0717] In one embodiment, the implicit stack of recursion 
keeps track of the XML levels only. An additional data 
structure (a level map) is used to keep track of the relational 
levels in order to make the technique behave properly. 

[0718] To generate a level map, the columns of the SQL 
query result are partitioned into equivalence classes, such 
that the columns in each class are at the same relational 
level. Because the notion of relational levels are in the 
semantics of the data, it is generally impossible to deduce 
this partition information from the SQL query alone, espe- 
cially in legacy databases where tables have been created 
without proper declarations of primary keys and foreign 
keys. 

[0719] In one embodiment, the user specifies the partition 
in a DAD by deciding which pieces of data should "come 
together as a class conceptually." In the example, order_key, 
customer name, and customer_email come together to form 
the conceptual class Order. Similarly, date and mode should 
come together to form the notion of a shipment. 

[0720] In another embodiment, the partition is generated 
automatically using some heuristics. One heuristic technique 
assumes that the columns in the result of the SQL query are 
in a top-down order of the relational hierarchy. It also 
requires a single-column candidate key to begin each level. 
If such a key is not available in a table, the query generates 
one for that table using a table expression and the built-in 
fumction generate_unique( ). For further illustration, refer to 
the query in the example to see how it handle ship_tab, 
which does not have a single-column candidate key. 

[0721] The technique selects distinct counts from the 
result of the SQL query on the first column, the first and the 
second, the first and the second and the third, and so on. It 
starts a new partition whenever it detects a change in the 
distinct counts. Because of a restriction of the "select 
distinct" feature of DB2®, any character data longer than 
254 bytes will be truncated. 



[0722] The following data structures are used: 
[0723] levelmap 

[0724] Levelmap is an associative array that maps 
column names to their equivalence class numbers or 
"relational level " The equivalence classes in ascend- 
ing order of relational levels should have one-to- 
many relationship between each adjacent classes 
with the "many" side at the upper level. In the 
example, "orderjcey" maps to 0; "part Jcey" maps to 
1; and "date" maps to 2. The associative array can be 
implemented in memory, for example, as a hash 
table, a sorted array, or a binary search tree. 

[0725] The following are methods of levelmap: 

[0726] acquire( ) 

[0727] It uses some heuristics to determine the par- 
tition to initialize the level map. This method 
requires SQL access to dxxcache. 

[0728] int getLevel(column) 

[0729] It retrieves the relational level of the column. 

[0730] char *[feetColumns(int level) 

[0731] It retrieves the column names of the specified 
level. 

[0732] release< ) 

[0733] It deallocates the memory that was allocated 
in acquire( ). 

[0734] outbuf 

[0735] It is an automatically expandable buffer for 
holding an XML document during construction. At 
the end of the construction, outbuf holds the entire 
XML document. 

[0736] The following are methods of outbuf: 

[0737] acquire(int estimated_size, location) 

[0738] It allocates the buffer using the estimated size 
as the initial size. The location parameter specifies 
whether the buffer will be based on main memory or 
a disk file. 

[0739] append(string) 

[0740] It appends the string into the buffer, and 
expands the buffer if necessary. 

[0741] char *getContent( ) 

[0742] It retrieves the content of the buffer if it is in 
memory or the filename if it is in a file. 

[0743] rcleasc( ) 

[0744] It deallocates the buffer. 

[0745] E.4.2 Pseudocode for Implementation 

[0746] The following is a set of pseudocode for imple- 
menting a stored procedure to generate an XML document 
from a single SQL query in an embodiment of the invention: 
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dxxGenXML(dadbuf, resulLJabname, m, q, sqlstate, msgtext) 

dadbuf a memory buffer containing a document access definition 

resulCJabname — the name of the table into which the resulting XML documents 
will be stored 

m the maximum number of documents/rows to return; 0 if none. 

a OUT: the actual number of documents/rows generated 

sqlstate OUT: the SQL state in case of errors 

msgtext OUT: message text in case of errors 

dxxGenXML(dadbuf, result_tabnamc, m, n, sqlstate, msgtext) { 

/" Parse the DAD and prepare the SQL query */ 

dad - the DOM tree of dadbuf; 

Perform the query and save the result into cache table, 

db2xm).dxxcache; 
Initialize levclmap by a heuristic technique. 

top columns - level map- >getCohunns(0); 

top query - "select distinct top_columns from dxxcachc**; 

PREPARE q INTO :sqtda FROM :top_qucry; 
DESCRIBE q INTO sqlda 
DECLARE cur CURSOR FOR q; 
OPEN cur, 
for i - (l~n) { 

FETCH cur USING DESCRIPTOR sqlda; 
if(SQLCODE — +100) r no more data •/ 

goto exit; 
ourouf->acquire(esriTnatrd_size); 
gcuXMLDoc(sqlda, levclmap, dad, 

& outbuf, re, msgtext); 
/• outbuf now contains an XML document */ 
INSERT INTO result_tabname VALUES outbuf; 
outbuf- > releaseO ; 

} 

msgtext - "Result exceeds maximum. " 

"Only the first n documents arc returned."; 

exit: 

CLOSE cur, 

Drop table dxxcache; 

gcnXMLDoc(sqlda, levclmap, dad, outbuf, re, msgtext) { 

/• Generate an XML document */ 

/* Retrieve the header information. */ 

Get the prolog and doctype from dad by DOM APL 

and write them to outbuf; 
/• Get the root element from DAD. 7 

Get the element node under the root node from dad by DOM APL 
genXMLElement(root_element_node, sqlda, 0, levclmap, 
outbuf, rc, msgtext); 

genXMLElement(node, sqlda, currentlevel, levclmap, outbuf, rc, 
msgtext) { 

Add the opening of the element start-tag to outbuf; 

Get all attribute children of this element from dad by DOM APL 
For each attribute, 

gcaXMLAttributc(attrnode, sqlda, 
outbuf, rc, msgtext); 
Add the closing of the element start-tag to outbuf; 
Get all other children of this element from dad by DOM API. 
For each child, do { 

if the child is a text node { 

gcnXMlXexu^node, sqlda, outbuf, rc, msgtext); 
}etse{ 

Get the level of the first column of this child by DOM 
API. 

if (childlevel < currentlevel) 

handle the error, 
if (childlevel — currentlevel) { 

genXMLElemen^childnodc, sqlda, currentlevel, 

levclmap, outbuf, rc, msgtext) 
}else{ 

/• Open another cursor to generate 

possible multiple occurrences at 

this level 7 
Get the columns of the child's level by 
levelmar»getColumns(childlcveI). 
Construct a whe re-clause from sqlda by equating 

all pairs of column names and values, 
query = "select distinct columns from dxxcache " 
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"where wberc_clause_from_sqlda"; 
PREPARE q INTO :childaqtda FROM :qucry, 
DESCRIBE q INTO :childsqlda 
DECLARE childcur CURSOR FOR q; 
OPEN childcur, 
while (1) { 

FETCH childcur USING DESCRIPTOR childaqlda; 
if(SQLCODE — +100) /" no more data *l 
goto done; 

gcaXMLEkn*nt(childaodc, childsqlda, childlcvcl, 
levebnap, 

& outbuf, rc, msgtcxt); 

} 

} 

done: 

CLOSE childcur, 

} 

Add element end-tag to outbuf; 

genXMLAttnbute(node, sqlda, outbuf; rc, msgtext) { 
Get the value of the attribute from its column in sqlda. 
Append to outbuf: " name - \ M value\""; 

genXMLText(node, sqlda, outbuf, rc, msgtcxt) { 
Get the text of node from its column in sqlda. 
Append the text to outbuf. 



[0747] Given the approach taken for the formulation of a 
query, there is a problem of duplicated data in certain 
higher-level columns, such as customer_name in the 
example, to be tackled. A solution to this problem is to group 
or "aggregate" the columns that have one-to-one mapping 
into an equivalence class. An advantage of this solution is 
that the XML System does not need to parse the user's SQL 
query. 

[0748] To eliminate the duplicates in higher-level col- 
umns, the result of the SQL query is traversed at least once 
for each level. By saving the result into a cache, such as a 
temporary table, executing the query multiple times is 
avoided. The size of this cache table is usually smaller than 
some of the original user tables and no join is needed for 
querying the cache. 

[0749] To return a result set, the stored procedure opens a 
cursor and leaves it open. The stored procedure still needs a 
table for the query for which the cursor is declared 

[0750] In DB2<8>, a result set is available only to client 
programs that are written using Call Level Interface (CLI) 
and not using static SQL. On the other hand, any SQL client 
can have access to a result table. 

[0751] E.4.3 Code Organization 

[0752] As for code organization, the XML System code 
consists of a stored procedure, dxxGenXML, some SQL C 
functions called by the stored procedure, and a few C++ 
classes or C structs for defining the necessary data struc- 
tures. The data structures are defined as C structs because of 
the rules of DB2® Extenders. The module can be linked into 
the db2xml DLL with other stored procedures. It interacts 
with an XML4C parser using the single document interface 
functions: dxxIniualizeParscr and dxxDOM, which have 
already been implemented and used by enablc_column. 

[0753] FIG. 7 is a diagram illustrating code organization 
to compose XML documents. The dxxGenXML( ) stored 



procedure 700 and the dxxRetrieveXML( ) stored procedure 
702 each invoke the dxxComposeXML( ) module 704. 
Decision block 706 determines whether RDBnode map- 
ping is to be performed. If RDB_node mapping is not to be 
performed, processing continues with SQL to XML module 
708 and the SQL to XML mapping is performed by the SQL 
to XML mapping technique 710. If RDB node mapping is 
to be performed, processing continues with the dxxGenXM- 
LFrom RDB( ) module 712 and RDB node mapping is 
performed by the RDBR technique. 

[0754] E5 Components and Row Diagram 

[0755] FIG. 8 is a block diagram illustrating components 
of the XML System in one embodiment of the invention. 
Relational tables 800 store relational data. A Document 
Access Definition (DAD) 802 defines an Xcollection 804 
and a SQL query 806. A Document Type Definition (DTD) 
808 is used to validate and define the DAD 802. The SQL 
query is used to retrieve data from the relational tables 800. 
Using the DAD 802, the SQL query 806, and the XML 
composition stored procedures 810, the XML system gen- 
erates one or more XML documents 812. The XML system 
stores the data used to generate the one or more XML 
documents in an XML Collection table 814. Although the 
relational tables and XML Collection tables are shown in 
different data storage devices 800 and 814, both types of 
tables could reside at one data storage device. 

[0756] FIG. 9 is a flow diagram illustrating the steps 
performed by the XML System to transform relational data 
into one or more XML documents using SQL mapping. In 
block 900, the XML System receives a DAD comprising an 
Xcollection definition. The Xcollection definition includes a 
SQL_query element, which is a valid SQL query. In block 
902, the XML System parses the DAD and prepares the 
SQLjquery. In block 904, the XML System retrieves data 
selected by the SQL query and stores the data in a cache 
table. In block 906, the XML System removes duplicates 
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using XML composition stored procedures. In block 908, the 
XML System generates one or more XML documents using 
the SQL_query, the XML composition stored procedures, 
and the DAD. In particular, the XML System uses these 
components to map each column of the retrieved data to an 
XML element or attribute. Then, the XML System stores the 
data used to generate the one or more XML documents in an 
XML Collection. One skilled in the art would recognize that 
the one or more XML documents could be stored in another 
manner, for example, in other types of tables or as a file. 

F. Generating One or More XML Documents From 
a Relational Database Using the XPath Data Model 

[0757] This invention presents a technique for generating 
one or more XML documents from relational database tables 
using the XML Path Language (Xpath) data model. XPath 
models an XML document as a tree of nodes, including 
element nodes, attribute nodes and text nodes. These nodes 
form a Document Object Model (DOM) tree. 

[0758] In particular, the technique of the invention 
traverses a Document Object Model (DOM) tree generated 
from an XML formatted Data Access Definition (DAD), 
generates hierarchical SQL statements to query data from 
relational tables, then generates one or more structured XML 
documents. Using this invention, a user can directly map 
data in an existing database to one or more XML documents, 
without requiring the transformation from data in a rela- 
tional database to data in an intermediate XML format. 

[0759] This invention implements a stored procedure that 
takes a Data Access Definition (DAD) and a name of a result 
table and returns a result table that is populated with the one 
or more generated XML documents. The DAD defines the 
mapping from the relational tables to the one or more 
generated XML documents. In a preparation stage, the 
technique traverses a DOM tree to gather information of 
each database table to be used in generating one or more 
XML documents. Then, the technique will generate SQL 
statements, query relational data, and write XML document 
tree contents in a recursive manner. During the recursive 
processing, the SQL statements are generated by using the 
previously prepared information and passing join values 
down from a higher level SQL query to a lower level 
WHERE clause. The result of each query will be taken as the 
XML attribute value and element text to be written to the 
output XML documents. 

[0760] FIG. 10 is a flow diagram illustrating the process 
performed by the XML system using RDB_node mapping to 
compose XML documents. In block 1000, the XML system 
generates a document object model (DOM) tree from an 
XML formatted data access definition (DAD). The tree 
comprises relational database nodes (i.e., element nodes, 
attribute nodes, and text nodes). In block 1002, the XML 
system traverses the DOM tree to generate SQL queries. In 
particular, the relational database nodes identify relational 
tables and columns from which relational data is to be 
retrieved, along with predicates and join relationships 
between tables. In block 1004, the XML system executes 
SQL queries to retrieve relational data. In block 1006, the 
XML system maps the relational data to one or more XML 
documents using the DAD. The DAD defines a mapping 
between the relational data and one or more XML docu- 
ments. Furthermore, the relational data maps to attribute 
values or element text of an XML document. 



[0761] F.l Example 

[0762] The following is an example of generating an XML 
document from a relational database using an RDB_node 
(which defines the mapping between an XML element or 
attribute and relational data) in the DAD. In particular, a 
relational database is illustrated. Then, the results of per- 
forming SQL queries against the relational database are 
illustrated. Moreover, the Document Type Definition (DTD) 
and Document Access Definition (DAD) are provided. After 
this, one XML document that is generated to contain the data 
retrieved by the SQL query is illustrated. 

[0763] Relational Database: 



ortter_tab: 

order_key costomer_name costomcr_cmail aistomer_phone 

1 General Motor parts@gm.oom 800-GM-PARTS 

part_tab: 



part_key color qty price tax order_Jtey 



156 red 17 17954.55 0.02 


1 


68 black 36 34850.16 0.06 


1 


128 red 28 38000.00 0.07 


1 


ship_tab: 




date mode comment 


part— key 


1998-03-13 TRUCK This is the first shipment to service 


156 


of GM. 




1999-01-16 FEDEX This the second shipment to service 


156 


of GM. 




1998-08-19 BOAT This shipment is requested by a call 


68 


from GM marketing. 




1998-08-19 AIR This shipment is ordered by an email. 


68 


1998-12-30 TRUCK NULL 


128 



[0764] The following is an XML Document that is to be 
generated from the above relational data: 



<?xml version-" 1.0"?> 

<!DOCTYPE Order SYSTEM "E:\dir\tcstVltd\Uteoi.dtd"> 
<Order key-"l"> 
<Customer> 

<Name>Geneial Motor</Name> 
^Fnwil-a^Ttg/ggm rnm</Fnun'l> 
<VCustomer> 
<Part color—* 4 rcd"> 
<key>68</key> 
<Quantiry>36 </Quantity> 
<ExtendedPrice>34850.1 6</ExtendedPrice> 
«^Iax>0.06</Tax> 
<Shipment> 

<ShipDate>1996-04-l2</ShipDate> 
<SriipMode>BOAT</ShipMode> 
</Shipmcnt> 
<Shipment> 

<ShipDate>1998-08-19</ShipDate> 
<ShipMode>A[R</ShipMode> 
</Shipmcnt> 
</Part> 

<Part colop-**red"> 
<key>128</kev> 
<OuantUy>28 ^Quantity > 
<ExtendedPrice>38000.00</ExtendedPrice> 
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<fftx>0.07</Tkx> 
<Shipmcnt> 

<ShipDate>1998-32-30</ShipDate> 
<ShipModc>TRUCK</ShipModc> 
c/Shipmeal> 
</Part> 
c/Ordcr> 



[0765] Assuming the following structure of an XML docu- 
ment will be generated from the data selected by a SQL- 
_stmt, how to use an XML collection to specify the DAD 
will be illustrated below. 



<?xml version-" 1.0"? > 

<!DOCTYPE Older SYSTEM "E:\dtd\dxxdad.dtd"> 
<DAD> 

<dtdid> E :\dtd\l inel te m. did </dtdid> 
<validation>YES</validatio n> 
<XcoUection> 

cprolog>?xml version-" 1.0"? </piolog> 
<xioctype>!DOCTYPE Order SYSTEM "E:\dtd\lincltrnxdid" 
</doctypc> 
<root_nodc> 

<rlcment _nodc name- , *Order"> 
<RDB_node> 

ctable name-"oider_tab7> 
ctablc name-"part_tab7> 
<table name-"ship_Jab7> 

order_tab.order_key - part_tab.order_key AND 
part__tab.DarLjcey - &hip^tab.part„Jccy 
</condition> 
c/RDB_jiodc> 

<aUribute_node namc-"Key**> 
<RDB_node> 

<table name-"order_tab7> 
<column name-"order_Jcey7> 
</RDB_node> 
</&ttribute node> 

celement_node name— "Customer" > 
<element_ttode name-** Name" > 
ctext_node> 
<RDB__node> 

<tablc name— "*ordei_tab7> 

ccolumn name— "customer namc"> 

</RDB_node> 
</texl__node> 

c/elemcnt node> 

<element node name— "Email" > 

ctext node> 

<RDB_nodc> 

•ctablc name— "order _tab7> 

•ccolumn name— "customer email7> 

</RDB_node> 

c/text node> 

</elemcnt node> 

c/elemcnt_node> 
cclcment__nodc name— "Part"> 

cattribute node name-"Key"> 

<RDB_node> 
ctable name— "part_tab"> 
<column name— "part _Jcey"> 
</RDB _node> 
</attribut©-node> 

celement_node name— "ExtendedPrice**> 
<tert __node> 
<RDB_node> 

ctablc name-"part_tab7> 
<column name— "pricc7> 
<condition> 
price > 2500.00 



-continued 



</condition> 
c/RDB_nodc> 
c/lcxt_nodc> 
</element_nodc> 

celcment node name— Tax"> 

<text_jiode> 
<RDB_jiode> 

•ctable name— "part_tab7> 
ccolumn name— "tax7> 
</RDB„nodc> 
c/text_node> 

celement node name— "Part" > 

cattribute node name-"Kcy**> 

<RDB _nodc> 

<table name-"part__lab7> 
•ccorumn name— "part__xey7> 
</RDB_node> 
</attribute_jiode> 
<elcment_node name-"Quantity" > 
<texL_nodc> 
<RDB_jiode> 

<table name-"parL_tab7> 
<column namc-"qty7> 
</RDB_node> 
</text_node> 

</elemcnt node> 

</element__node> 

celcment node name— "shipment" > 

celement __node name-"ShipDate"> 
<text_jiode> 
<RDB_node> 

<table n i"P *— "ffh ip ^h "/** 
ccolumn namc-"date7> 
<condilion> 

date > "1966-01 -Or 
c/condilion> 
</RDB_node> 
c/text __node> 
</dement__node> 

celement node name— "ShipMode"> 

ctext __node> 
<RDB _jiodc> 

<tablc name— "ship tab"/> 

■ccolumn name— "modc7> 
</RDB__node> 
c/text_node> 
<Velemcnt_node> 

celement node name— "Commcnt"> 

<text_node> 
<RDB_xode> 

ctable name-"ship__tab7> 
ccorumn name— "comment7> 
</RDB_node> 
</text__node> 
</element_node> 
</element_node><! — end of element Shipments 
</clemem__node><! — end of element Part — > 
<Vclcment_nodexl - end of clement Order — > 
<^root_node> 
c/Xcollcction> 
</DAD> 



[0766] Assuming the XML documents need to be com- 
posed or decomposed are like the one shown in the example 
in Section El, the following sample DAD shows how to 
define the mapping from relational tables using RDB_node 
Mapping. In particular, the following example DAD shows 
how to compose/decompose a set of XML documents from/ 
into three relational tables while using the RDB_node to 
specify the mapping. 
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Utem_J>AD3.dad 

<?xml version-" 1.0**?> 
<tDOCTYPE Older SYSTEM "E:\dtd\dad.dUT> 
<DAD> 

<xltdid>E:\dtd\liiicltetn.dtd </dldid> 

<va!idation>YES</vaUdatioii> 

<Xcollcction> 

<prolog>?xml version-" 1.0"? </prolog> 

<da*ype>!DOCIYFE Older SYSTEM "E:\dtd\lmeItem-drd"</docrype> 
<root_jiode> 

<clement_node name-"Order"> 
<RDB„_node> 

<table name-"oider_tab" key-**order_jEey*7> 
<table name-"pajt_lab' t kcy-* t part_kcy'V> 
<table name— "ship_tab" key-"date"/> 
<cooditioii> 

ordet_tab.order_Jcey - part_tab.order_key AND 
part_tab.part_key - ship_tab.part_Jflcy 
</coadition> 
</RDB_oode> 

<attzibute_DDde name— ~Key"> 

<RDB_node> 

<table name-"order_tab*Y> 

<column name-border _Jcey" type—*' integer" /> 

</RDB_nodo 

</aUxibutc_node> 

<elemenl_aode name— "Customer"> 

<text_node> 

<RDB__node> 

<table namc-"order_tab'*/> 

<column name— "customer" type"**char(128)"/> 

</RDB_node> 

</text__node> 

</elcment nodc> 

<e]cmeni_node name— "Part"> 

<attribute_node name— ~Key"> 

<RDB_node> 

<table name-"part_tab'7> 

<co!umn namc-**part_iey M type-"integer*7> 

</RDB_r*odo 

</attribute nodc> 

celemeni _node name-"Quantity"> 

<text_node> 

<RDB_node> 

<table name-"part_tab'7> 

<column name-"qty" type-"integer*7> 

</RDB_node> 

</tert__ricKie> 

</elcmcnt__nodc> 

<elemcnt_node name-"EjttendedPrice"> 

<text_node> 

<RDB_rjodc> 

<tablc oame-'*part_tab"/> 

ccolumn oamc-'^prux" type— "reaT7> 

<conditioa> 

price > 2500.00 

</condiiion> 

</RDB_oode> 

</text__node> 

</clcmcnt__nodc> 

<xlcmcnt_nodc name-"Tax w > 

<tert_Dode> 

<RDB_rjode> 

<table name-**part_tab"/> 

ccolumn name-"tai" type—**real7> 

</RDB_node> 

</text_node> 

</element_jiode> 

ceIemcol_node oame-"shipment"> 

eelement node name— "ShipDate"> 

<text_node> 

<RDB_node> 

<table name-**ship_lab*7> 

<column name-" date" type— "date*7> 

<condltion> 

date > "1966-01-01" 



-continued 



</condition> 
<VRDB_node> 
</texL_node> 
</elemeat_nodc> 

<eicmcnt_nodc name— "ShipMode"> 

<text_jiode> 

<RDB_rtode> 

<table name-"ahip_tab"/> 

ccolumn name— "mode" cype><hai(120)r/> 

</RDB_nodc> 

</text_jnode> 

</eLement node> 

<element_node name-*Commenr> 

<±exfc_jiode> 

<RDB__node> 

<table n a"ie — "fthi p f f> ^ "A** 

ocohimn name— "comment" type— "varchai(2k) r 7> 

</RDB_node> 

</text_jnode> 

</element_node> 

</element__node><! — end of element Shipment* 

</eLemenl__node><! — end of element Part — > 

</element_node><! — end of element Order — > 

</root_j»de> 

</Xcollection> 

</DAD> 



[0767] F.2 How to Use an XML Collection 

[0768] An XML collection is a set of relational tables 
which contain XML data. These tables can be new tables 
generated by the XML System when decomposing XML 
documents or existing tables which have data to be used by 
the XML System to generate XML documents. Stored 
procedures provided by the XML System serve as the access 
methods. Unlike the XML column, an XML collection does 
not have to be enabled. The enablement is based on the 
operations performed. 

[0769] A composition operation of an XML collection 
generates one or more XML documents from data existing 
in the collection tables. Therefore, for this operation, an 
XML collection does not need to be enabled, providing all 
tables already exist in the database. The DAD will be passed 
to a stored procedure. The DAD can be overridden by other 
XML query parameters as the stored procedure input param- 
eters. This kind of parameter can be obtained from the Web 
dynamically. 

[0770] In the DAD preparation, "Xcollection" is defined 
first. An Xcollection can be defined for composition with 
RDB_oode mapping. The following steps apply: 

[0771] Specify the DTD ID. The DTDED must be the 
same as the system ID in the doctype. 

[0772] Specify the validation. When using DTDID, 
the validation should be M YES**. 

[0773] Specify the prolog and doctype. For compo- 
sition, the same prolog and doctype are allowed. 

[0774] Specify RDB_node mapping, which will be 
described in Section ¥5 below. 

[0775] When using the RDB oode mapping, the RDB n- 
ode should be defined for a root element_node and each 
text node and attribute_node. The RDB_node defines the 
table and column in the relational database which is to be 
mapped to an XML element or attribute. 
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[0776] The following illustrates the mapping with RDB- 
nodes by the sample DAD Litem_DAD3.dad. This basi- 
cally describes how to specify the RDB_node. 

[0777] Define the tables and the predicate to join 
these tables for the root element_node (the element- 
_node under the root node). All tables which con- 
tribute data for the root element should be included, 
and the condition to join these tables should be 
stated. The primary and foreign key relationship for 
join is strongly recommended, but not required. In 
the example, the root element_node Order's RDB- 
_node has three tables: order_tab, part_tab and 
ship_tab as shown below. 

[0778] For each attribute_node or textnode, the 
table and column which contains its data are speci- 
fied. If a user has a select condition, the predicate in 
the condition of the RDB_node is specified. In the 
example, for text_node "Extended Price", the table is 
specified as parMab, column as price, and the con- 
dition as "price>2500.00". 

[0779] In the RDB_node mapping, the XML System will 
traverse the document tree structure to generate the XML 
documents. 

[0780] <element_node name-" Order" > 

[0781] <RDB_node> 

[0782] <table name-"order_tab" key-"order- 

_key7> 

[0783] <table name-"part_tab" key»"partJcey"/> 

[0784] <table name-"ship_tab" key-"date" 
orderBy-"date7> 

[0785] <condition> 

[0786] order_tab.order_key-part tab.orderjtey 
AND 

[0787] part_tab.part_key-ship_tab.part_key 

[0788] </condition> 

[0789] </RDB_node> 

[0790] F.2.1 Enabling an XML Collection 

[0791] The purpose of enabling an XML Collection for 
decomposition is to parse a DAD, create new tables or check 
the mapping against existing tables. The DAD is stored into 
the XMLJJSAGE table when the XML Collection is 
enabled. 

[0792] When a user prefers to have the XML System 
create collection tables, the user should enable the XML 
collection. Additionally, the enablement depends on the 
stored procedure the user chooses to use. The stored proce- 
dure dxxlnsertXML( ) will take XML Collection name as 
the input parameter. In order to use the stored procedure 
dxxInsertXML( ), the user must enable an XML collection 
before calling it. The user can call stored procedure dxx- 
ShredXML( ) without the enabling of an XML collection by 
passing a DAD. In the later case, all tables specified must 
exist in the database. 



[0793] F.2.1.1 Enabling XML Collection Option 

[0794] For composition, an XML collection is not required 
to be enabled. The assumption is that all collection tables 
already exist in the database. The stored procedure can take 
a DAD as an input parameter and generate XML documents 
based on the DAD. On the other hand, the composition is the 
opposite of the decomposition. For XML collections enabled 
during the decomposition process, the DAD is likely to be 
used to compose XML documents again. If the same DAD 
is used, then the collection can be enabled for both compo- 
sition and decomposition. 

[0795] An XML Collection can be enabled through the 
XML System administration GUI (graphical user interface) 
or using the dxxadm command with the Enable_collection 
option. The syntax of the option on a DB2 server is as 
follows: 

[0796] dxxadm enablejcoUection db_name collection- 
_name DAD file [-t tablespace] 

[0797] where 

[0798] a. db_name: name of the database. 

[0799] b. collection_name: name of the XML collec- 
tion, which will be used as the parameter to the 
stored procedures. 

[0800] c. DAD_file: Data Access Definiuon (DAD) 
file. 

[0801] d. tablespace: Optional The tablespace con- 
tains tables in the XML collection. If new tables need 
to be created for decomposition, then new tables will 
be created in the specified tablespace. 

[0802] The following is an example of enabling the XML 
collection called sales order in database mydb with the 
DAD_file litem_DAD3.dad. 

[0803] /home/ulxlxxadm enable_collection mydb sale- 
sorder Litem_DAD3.dad 

[0804] DXXA009I XML System is enabling collection 
sales_order. Please wait. 

[0805] DXXA010I XML System has successfully 
enabled the collection sales_order. 

[0806] /home/ul> 

[0807] The enable_collection option mainly does the fol- 
lowing things to a database: 

[0808] e. Read the DAD file, call XML parser to 
parse DAD, and save internal information for map- 
ping. 

[0809] f. Store internal information into the 
XMLJJSAGE table. 

[0810] The option is good for performance and is usually 
helpful to perform composition and decomposition using 
one DAD. 

[0811] F.2.1. 2 Enable collection Option 

[0812] The enablejcollection option enables an XML col- 
lection associated with an application table. The association 
between the application table and the side table specified by 
the DAD is through the root_id. 
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[0813] Syntax 

[0814] dxxadm enable_collectioo dbname collection 
DADFile [-t tablespaoel-l loginj-p password] 

[0815] Argument 

[0816] g. db_name: the database name. 

[0817] h. collection: the name of an XML collection. 

[0818] i. DAD_File: the file containing the DAD. 

[0819] j. tablespace: Optional. The tablespace con- 
taining a user table specified in the DAD or side table 
created by the XML System. 

[0820] k. login: Optional. The user ID, which is only 
needed if the command is invoked from a DB2 
client 

[0821] 1. password: Optional. The password, which is 
only needed if the command is invoked from a DB2 
client. 

[0822] The enable_collection option will enable an XML 
collection. The enablement process is to parse the DAD and 
prepare tables for XML collection access. It takes the 
database name, a name of the XML collection, a DAD_File 
and an optional tablespace. The XML collection will be 
enabled based on the DAD in the DAD File. It checks 
whether the tables specified in the DAD exist. If the tables 
do not exist, the XML System will create the tables accord- 
ing to the specification in the DAD. The column name and 
data type is taken from the RDBnode of an attribute_node 
or text_node. If the tables exist, the XML System will check 
whether the columns were specified with the right name and 
data types in the corresponding tables. If a mismatch is 
found, an error will be returned. The tablespace is optional, 
but it is specified if the collection tables are to be created in 
a tablespace other than the default tablespace of the specified 
database. 

[0823] The enable_collection is required for decomposi- 
tion stored procedure dxxInsertXML( ), and its pairing 
dxxRetrieveXML( ), and the dxxUpdateXML( ). For stored 
procedure dxxGenXML( ) and dxxShredXML( ) which take 
a DAD as input, the enablement of an XML collection is not 
required. For the latter stored procedures, it is assumed that 
all tables specified in the DAD exist in the database already. 
If they don't exist, an error will be returned. Tie enable_c- 
ollection does have a pairing disable_coUection option. But 
the operation of disable collection is much simpler. It just 
deletes the collection from XML_USAGE table. 

[0824] As discussed in Section E, FIG. 7 is a diagram 
illustrating code organization to compose XML documents. 
The dxxGenXML( ) stored procedure 700 and the dxxRe- 
trieveXML( ) stored procedure 702 each invoke the dxx- 
ComposeXML( ) module 704. Decision block 706 deter- 
mines whether RDB_node mapping is to be performed. If 
RDB node mapping is not to be performed, processing 
continues with SQL to XML module 708 and the SQL to 
XML mapping is performed by the SQL to XML mapping 
technique 710. If RDB node mapping is to be performed, 
processing continues with the dxxGenXMLFrom RDB( ) 
module 712 and RDB_node mapping is performed by the 
RDBR technique. 



[0825] F3 Using RDB Node Mapping Scheme 

[0826] The mapping between composed/decomposed 
XML documents and an XML collection is specified in the 
Xcollection of a DAD. The XML System adapts the notation 
used in XSLT and uses a subset of it to define the XML 
document structure. In order to facilitate the mapping, the 
XML System introduces the element Relational DataBase 
node (RDB node) to the Xcollection. 

[0827] The DAD defines the XML document tree structure 
using seven kinds of nodes defined by XSLT/XPath: 

[0828] root_node 

[0829] elementjaode 

[0830] text_node 

[0831] attribute_node 

[0832] n amespace_node 

[0833] processing^instruction_node 

[0834] comment_node 

[0835] For simple and complex compositions, the RDB- 
_Node is used to define where the content of an XML 
element or value of an XML attribute is to be stored or 
retrieved. 

[0836] The RDB Node has the following components: 

[0837] Table: the name of the relational table or 
updateable view, in which the XML element content 
or attribute value is to be stored. 

[0838] Column: the name of the column which con- 
tains the element content or attribute value. 

[0839] Condition: the predicates in the WHERE 
clause to select the desired column data. 

[0840] In the definition of an Xcollection, for this embodi- 
ment of the invention, the following approach is used to 
define the mapping: 

[0841] RDB node Mapping: Specify RDB_node for 
each text_node and attribute_node, and the root 
element_node in the XML data model. In this case, 
the XML System will generate SQL statements 
based on the RDB_nodes and document tree struc- 
ture. 

[0842] The text_node and attribute_node will have a one- 
to-one mapping to/from a column in a relational table. 
Therefore, each of them will have a RDB node to define the 
mapping, where the RDB_node is needed for the RDBjiode 
mapping. It is possible that an element_node has no text- 
_node but only child element_node(s). 

[0843] Using a RDB node to specify each textnode and 
attribute_node is more general. Only the root element_node 
needs to have a RDB node. In this RDB_node, the user is 
required to specify all tables used to compose/decompose 
data, as well as ajoin condition among these tables. The 
condition predicate in this RDB_node will be pushed down 
from the root element_node to all child nodes. Ideally, the 
way to tie all tables together within an XML collection is the 
primary-foreign key relationship. However, it often happens 
that some existing user tables do not have such a relation- 
ship. Therefore, requiring the foreign key relationship for 
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composition is too restrictive. However, in the case of 
decomposition, if new tables are created for storing the 
decomposed XML data, then the DAD requires a user to 
specify the primary key of each table, as well as the 
primary-foreign key relationship among tables. 

[0844] F.4 Detailed Techniques 

[0845] The following discussion focuses on the technique 
for one embodiment of the invention. There are two major 
phases of the technique: the preparation phase and the 
generating phase. 

[0846] F.4.1 Preparation Phase 

[0847] In this phase, the relational structure is generated 
by processing the RDB_node of the root element node. It is 
required that in this RDB node, all tables contribute data to 
the XML document to be listed, as well as the join condi- 
tions between these tables. 



[0848] In this phase, a mapping is added between a 
relational column and an XML attribute value or element 
text to the relational table structure, so that the technique 
tracks where the relational data is from. 

[0849] F.4.2 Generating Phase 

[0850] From the root element_node, the technique 
traverses the DAD DOM Tree, using the relational infor- 
mation recorded in a RELdata structure prepared in the first 
phase to generate a SQL statement. Then the data selected is 
used to fulfill the XML attribute value or text of an element. 

[0851] F.43 Data Structures 

[0852] The following data structures are used by the 
invention: 

[0853] The following data structures are used for decom- 
position of XML documents using RDB-nodes. 



/• 

* Column to XML node pair 

• ♦/ 

typedef struct col2xml { 

char col[DB2_TAB_COL_VIEW_XEN]; 

char data!Vpe{DB2_TAB_COL_VIEW_J J EN+«]^* data type of the column *l 
char xmJ{DXX^XML_FIELD_SIZEi/* xml can be attribute or element name •/ 
int xmJType;/' DXX^ATTRIBUTE or DXX__TEXT V 
int nnlLevel;/* for determining if a column 

needs to be copies to the 

next row */ 
int indexInSQLDA; 
} DXX__COL2XML; 

r 

* Table and Column pair 

• ♦/ 

typedef struct tabcol { 

char tabf DB2_TAB_COL_VIEW_LEN]; 

char col[DB2_TAB_COL_VIEW„LENl; 

int indexInSQLDA; 

} DXX_TAB_COL; 

/* 

* Join information 

• •/ 

typedef struct join_info { 

char col[DB2_TAB_COL_VIEW_LEN]; 

/* column used in the join condition */ 
int num_jouV* number of foreign columns to be joined */ 
DXX_TAB_COL foreign(DXX_NUM_FORHGN]; 

/* foreign column and its table to be joined */ 
} DXX_JOIN_JNFO; 
/• 

* Primary key Information: 

* */ 

typedef struct prLJccy { 

int num_col;/* number of columns in the key •/ 
char rcune[DXX^UM_JdAPPrNGlDB2 

/* the column names of the primary key */ 
} DXX PRIKEY; 
/. 

* Foreign key Information: 

* V 

typedef struct for_key { 

int num_col;/* number of columns in the key •/ 
DXX_COL2XML col[DXX_NUM_MAPPlNG]; 

/* the columns of the foreign key */ 
} DXX_FORKEY; 
/• 

* Table Information: 

• */ 

typedef struct tab { 

char namej DB2_TAB_COL_VlEW_LEN \t* name of the table */ 
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DXX_PRIKEY prL-key-,/* primary key •/ 
DXX—FORKEY foi_key;/* foreign key •/ 
int levcU/* relational level of the table 7 
char top_clcmcntj DXX_XML_FIELD_SIZE |; 

/* the highest level of XML clement using 

* column data of this table */ 

char sql_5tmt(MAX_STMT_LENt/* SQL statement 7 
SQLHSTMT hsbmV* CLI statement handle 7 
struct sqlda *sqldaPuV* pointer of selected data */ 
int mim col;/* number of columns whic data used in the 

* XML document 7 
DXX_COL2XML col2xml[DXX_NUM_MAPPINGl 

/* mapping between column in this table to 

* XML attribute or element text 7 

int num_jouu/* number of columns in this table which 

* will form join conditions 7 

DXX_JO[N_INFO join_info(DXX_NUM_10IN_IN_TABLEl/* join information 7 
char condition(DXX_CONDrnON_IN_ ilEL]; 
/* condition of select for columns in this 

* table 7 

} DXX_TAB; 

/• 

* Relation Information for entire XML documents 
* 7 

typedef struct rel { 

int current level;/* the current XML level during treversai 7 

int num_tablesV* number tables to generate/decompose XML doc */ 
DXX_TAB *tab(DXX_NUMTAB _JN _RELL/* details of each table */ 
char **top_elements; /* index to tab(i]->top_element for fast search. */ 
SQLHDBC hdbcV* CLI connection handle 7 
} DXX_REL; 

/* 

* Data structure to store a row 

• •/ 

def struct row { 

int num col; 

char **coldata;/* Array of pointers 7 
} DXX_ROW; 

I* 

* Data structure to store the rows 
• */ 

typedef struct rows { 
int num_rowB; 

DXX .ROW • row( DXX_MAX_ ROWS]; 
} DXX_ROWS; 



[0854] F.4.4 Pseudocode 

[0855] The following is sample pseudocode for one 
embodiment of the invention: 



dxxGenXMLFromRDBChdbc^dad, resulLJabname, orveirideType,override,m,n) 
hdbc odbc handle 

dad DOM tree of document access definition (DAD) 

result__tabname — the name of the table into which the resulting XML documents will be stored 
overrideType type of override 

override string containing the override conditions 

m the maximum number of documents/rows to return; 0 if none. 

n OUT: the actual number of documents/rows generated 

Note that in the following technique, m and n are ignored in one embodiment. 
dxxGenXMLFromRDB^dbCfdad, result_tabname,overrideType,override m, n, ) 
{ 

working variable: 

charfj beading_buf; 
DOMtree dad; 
DOMNode top_element_node; 
DXX_REL relation_info; 
DXX__REL *rel - &relation_info; 
DXX_OVliRRlDli dxx_override; 
DXX_OVERRfDE_COND *dxx_override_ptr - 
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(DXX_OVERRIDE__OOND*) &dxx_override; 
/* initialize override data sturcturc */ 
dxxInitOvcrridc^vcrridcTypc.ovcrriUc^dri^ovcrridc); 
/* set relational structure */ 
sctRel(rcl,dad); 

rooL_element_jiode - get first elcmeni__node from dad; 
/* process relational information */ 

proc^_rcl(rcl^x»t_clcmcnt„iMxfc,abtt_ovcrride); 

/* Retrieve the header information- */ 

Get the prolog and doctype from dad by DOM API, 

and write them to heading_buf; 
/* process clement_nodc and generate XML "/ 
pToces8_roo1^elernent(hdrx:,rcl 
heading_buf); 

} 



[0856] Initialize Override Data Structure 

[0857] This is used for XMLOVERRIDE. Parse the input 
override parameter according to the overrideType, then 
break the conditions into an array structure, where each 
entry has a path and predicate. 



dzxIriitOverride(overrideTVpe^verride,dxx_ovenide) 
{ 

Check overrideType; 

if ( overrideType — XML_OVERRIDE ) { 
loop through input override string { 
parse override path expression and fill into dxx_override array entry; 
store path in dzx_override entry's path- 
store predicate in dxx_override entry's predicate; 
} 

} 

} 



[0858] Setup Relational Structure 

[0859] This routine will process the RDB node of the top 
element code and initialize the REL structure of entire XML 
documents. After the process, all tables involved in com- 
posing/decomposing XML. Documents should be included 
in the REL structure, and relationship between tables should 
also be recorded in the REL. 



setRel(reLdad, dxx_typc) { 
working variable: 

DXX_TAB *current_Jab; 
get the RDB__node of the top__element_node; 
if no RBD_node 
return error, 
for each table in RDB_node do { 
allocate space for currcnt_tab; 
curreat_tab->aame - RDB_node's table name; 
if (dxx_typc — DXX_DECOMPOSE) 
current__tab->primary_Jcey_name - RDB_node's table key, 
current_tab->top_element - NULL; 
current_tab->level - 0; 
current _ tab- >Qum _col - 0; 
current__tab->selectDaPtr - NULL; 
curren t_tab- >co oditio n - NULL; 
if( RDB_oode's condition !- NULL) 
scan the condition of RDB_node, 
current_tab->nurrL_join - 0; 
for each currcnt_tab .column in the predicate do{ 
i - currenl_tab->num__joiji; 



•continued 



airreat__tab->join_uifo(i].col - curent_tab.column; 
current, tab->joio_Jiifo{i].niim_Join - 0; 
for each other_tab.col - current_tab .column do{ 
j - cuixent_tab^>join_Jnfc{i].num_40ui; 
current_tab->joui_jnfo[ij,foreigntjjxol - col in other_tab 
cwrrent_tab->join_info{i].foreign(j].tab - other_ tab 
cunrnl_tat>>joiii_Jnfa{i].forei^ 
current__tab->join info[iJ.mim join++; 

} 

cunent_tab->nunx_goin++; 

} 

rel->tab(i] - current_tab; 

rcl->num_tablea++; 

} 

Example: 
rel->tab[0].name - "order _tab"; 
rel->tab[0].prirj^y_Jtey_name - "order_key"; 
rel->tab[03.level - 0; 
rcl->tab[0).top_clement - NULL; 
rel->tab{0LselectDaPtr - NULL; 
rel->tab[0].num_col - 0; 
rcl->tab[0).num__join - 1; 
rel->tab[0)->join_Jafo[0>>col - "order_Jcey"; 
rel->tab{0]->iom_jnfo[0}->num_goin - 1; 
rel->tab[0]->ioin_jnfo[0}->foreigri[0].col - "o_Jcey**; 
rel->tab{0>>ioin_Jnfo[0)->foreigr40].tab - "part_tab"; 
rel->tab{0}->joiii^nfo[0>>foreign[0].indexIriSQIX>A - DXX_NONE; 
rcl->tab(0]->condition - NULL; 
rel->tab(lj.nanie - •*part_tab"; 
rcl->tab( 1 J.primary_key_jaamc - **part_Jcey"; 
rel->tab[l].level - 0; 
rcl->tab[l j.top_element - NULL; 
rcl->tab[lj.selectDaPtr - NULL; 
rcl->tab{l].num_col - 0; 
rel->tab[ 1 J.Qum_join - 2; 
rel->tab[l}->join_uifo{0}->col - "o_kcy"; 
rel->tab[l}->join^jnfo[0]->nuni_join - 1; 
rel->tab[l]->join_Jnfo[0]->foreign[Ol.col - "order_Jcey"; 
rcl->tab[l]->join_iiifc{0]->foreign[OJ.tab - "orde^tab"; 
rcl->tab{l>>joiri_jiifo[0^ - DXX_J*ONE; 

rel->tab[l}->join_info(l)->col - 44 part_key"; 
rel->tab[J}->join_info[l}->nurn_join - 1; 
rel->tab[l]->joiri_info[l]->forcigii[Ol.col - "p_Jrey"; 
rel->tab[l]->join_Jiifo[lJ->foreigntO].tab - M ship_tab"; 
rcl->tab[l3->join_j^o[l^>foreignloiirjdci£nSQLDA - DXX_NONE; 
rel->tab[l}->condition - NULL; 
rel->tab[2j.name - "ship_tab"; 
rel->tab[2J.primary_key_iianie - "ship_date"; 
rel->tab[2j.top_clcmcnt - NULL; 
rel->tab|2J.level - 0; 
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rcl-Xnh(2]-&clcctDaPtr - NULL; 
rcl->tab(2).num_col - O, 
rcl->tab^2|.num„join - 1; 
rel->tah(2>>join_info[0)->col - "p_iey"; 
rcl->tab(2]->join_info[0)->aiiin^ioin - 1; 
rel->tab(2>>joiix_infb(0>>forign(0].col - "part_Jccy"; 
rel->tob(2>>join_Jnfo(0)->forign(0J.tab - "part_tab"; 



-continued 

rel->tab[2>>join_Jnfo[0}->foreigB[0lindexInSQLDA - DXX_NONE; 
rel->tab(2)->condition - NULL; 
rel->Mim__tables - 3; 



[0860] Process Relational Information 
[0861] This is the second phase of the preparation process 
together all information to generate SQL statements. It 
recursively processes each RDB node for each attribute, 
text, and element node of DAD, and records the mapping 
relationship into the REL data structure. 
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rel->tab[2]->join_info[0]->forign[0].tab = "partjab"; 
rel->tab[2]->join_info[0]->foreign[0].indexInSQLDA - DXX_NONE; 
rel->tab[2]->condition = NULL; 

rel->num_tables = 3; 



Process relational information 

This is the second phase of the preparation process together all information to 
generate SQL statements. It recursively processes each RDB_node for each attribute, text, 
and element node of DAD, and records the mapping relationship into the REL data structure. 

process_rel(rel, dom_node,current_level, dxx type) 
{ 

/* The first pass fills in everything in rel, 

except foreignkey. */ 
process_rell(rel, dom_node,current_level, dxx_type); 

if (dxxjype = DXX_DECOMPOSE) { 
/* The second pass fills in the foreign key of each table 

in rel and update num col. */ 
process_rel2(rel); 

} 
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findTop£Iements(rel, domjnode, dxxtype); 
Create a hash table or sorted array of all top_elements 
rel->top_elements for fast search. 

} 



process jrell(rel, dom node, dxx type) 



{ 

if (dom node type = element_node) { 
rel->current_level++; 



child = DOMGetFirstChild; 
while (child != NULL ) { 

processrel 1 (rel ,child,dxx_type); 

child = DOMGetNextchild; 

} 



} 



if (type = attributenode || 
type — text_node || 

(dom_node is a leaf element node && DXX DECOMPOSE)) { 
call DOMgetChild to get the RDB_node of domnode; 
if ( there is no RDB_node ) 
return error, 
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call processRDBnod^reljdomnode, RDB_node,dxxJype); 

} 



if (domnode type = elementnode) 
rel->current_level--; 

} 



/* Fill in the foreign_key of each table in rel. */ 
process_re!2(rel) 

{ 

/* Woiking variables: */ 
DXX TAB *foreignTab; 



for (tablndex = 0; tablndex < rel->num_tables; tablndex++) { 
currenttab = rel->tab[tablndex] ; 
for (i = 0; i < current_tab->mim J oin; i++ ) { 
for ( j=0; j< current_tab>join_info[i].nuinJoin; j++ ) { 
foreignTab = findTabByName(rel, 

ciirrent_tab->join_info[i].foreign[j] .tab); 
if (foreignTab->level < current_tab->level) { 
strcpy(current_tab->foreign_key.col, 

current_tab->join__info[i] . foreign[j ] .col); 
Find k such that foreignTab->col2xml[k].col 

equals current_tab->foreign_key.col; 
current_tab->foreign_key.xml = 
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foreignTab->col2xml[k].xml; 
current_tab->foreign_key.xmlType = 

foreignTab->col2xml[k].xmlType; 
current_tab->foreign_key.dataType = 

foreignTab->col2xml[k].dataType; 

} 

} 

} 

} 

} 

findTopElements(rel, dom_node, dxx type) 
{ 

/* findTopElements finds the top-element for each table in rel. */ 



/* findTopElements assumes that process rell has been called. */ 
if (dom_node type = element_node) { 
rel->cuirent_level-H-; 



child = DOMGetFirstChild; 
while (child != NULL ) { 

findTopElements(rel,child,dxx_type); 

child = DOMGetNextchild; 

} 

} 
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if (type = attribute_node || 
type = text_node || 

(dom node is a leaf element node && DXX_DECOMPOSE)) { 
call DOMgetChild to get the RDB node of dom_node; 
if ( there is no RDB_node ) 
return error; 

call findTopElement(rel,dom_node, RDB_node,dxx_type); 

} 

if (domjiode type = elementnode) 
rel->cuirent_Jevel— ; 

} 

Example: 

rel->tab[0].name = "ordertab"; 
rel->tab[0].primary_key_name = "order key"; 
rel->tab[0].foreign_key.col = ""; 
rel->tab[0].top_element = "Order"; 
rel->tab[0].selectDaPtr = NULL; 
rel->tab[0].num_col = 3; 
rel->tab[0]->col2xml[0].xml= "Key"; 
^el->tab[0]->col2xml[0].col= ,, order_key , '; 
rel->tab[0]->col2xml[0].dataType="intege^• , ; 
rel->tab[0]->col2xml[0].xmlType=DXX_ATTRIBUTE; 
rel->tab[0]->col2xml[0].xmlLevel=0; 
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rel->tab[0]->col2xml[l].xml= "name"; 
rel->tab[0]->col2xml[l].col="customer_name"; 
rel->tab[0]->col2xml[l].dataType= ,, char(128)"; 
rel->tab[0]->col2xml[l ] jonlType=DXX_TEXT; 
rel->tab[0]->col2xml[l ].xmlLevel=2; 
rel->tab[0]->col2xml[2].xml= "email"; 
rel^tabtOJ-x^nxmipi.col-'custome^email"; 
rel->tab[0]->col2xmI[2].dataType= ,, chai<128)"; 
rel->tab[0]->col2xml[2].xmlType=DXX_TEXT; 
rel->tab[0]->col2xml[2].xmlLevel=2; 
rel->tab[0].numjoin = 1; 
rel->tab[0]->join_info[0]->col = "order_key"; 
rel->tab[0]->join_info[0]->numjoin= 1; 
rel->tab[0]->join_info[0]->foreigti[0].col = "ojcey"; 
rel->tab[0]->join_info[0]->foreign[0].tab = "part_tab"; 
rel->tab[0]->jom_info[0]->foreigQ[0].indexInSQLDA = DXX 
rel->tab[0]->condition = NULL; 

rel->tab[l].name = "parttab"; 
rel->tab[ 1 ].primary key name = "part key"; 
rel->tab[l].foreign_key.col = "orderkey"; 
rel->tab[l].foreign_key.dataType = "integer"; 
rel->tab[l].foreign_key.xmJ = "Key"; 
rel->tab[l].foreign_key.xmlType = DXXATTRIBUTE; 
rel->tab[l].top_element = "Part"; 



US 2002/0123993 Al Sep. 5, 2002 

45 



rel->tab[l].selectDaPtr = NULL; 

rel->tab[ 1 ].num_col = 5; 

rel->tab[l]->col2xinl[0].xml = "Color"; 

rel->tab[l]->col2xml[0].col = "Color"; 

rel->tab[l]->col2xml[0].dataType="char(5)"; 

rel->tab[l]->col2xml[0].xmlType=DXX_ATTRroUTE; 

rel->tab[l]->col2xml[0].xmlLevel=l ; 

rel->tab[ 1 ]->col2xml[ 1 ].xml= "Key"; 

rel->tab[l]->col2xml[l].col="part_key"; 

rel->tab[ 1 ]->col2xml[ 1 ].dataType=" integer" ; 

rel->tab[l]->col2xml[l].xmlType=DXX_TEXT ; 

rel->tab[l]->col2xml[l ].xmlLevel=2; 

rel->tab[l]->col2xml[2].xml= "Quantity"; 

rel->tab[l]->col2xml[2].col="qty"; 

^el->tab[l]->col2xInl[2].dataType="integer' , ; 

rel->tab[l]->col2xml[2].xmlType=DXX_TEXT; 

rel->tab[l]->col2xml[2].xmlLevel=2; 

rel->tab[l]->col2xml[3].xml= "ExtendedPrice"; 

rel->tab[ 1 ]->coI2xml[3].col="price"; 

rel->tab[l]->col2xml[3].dataType="real"; 

rel->tab[l]->col2xml[3].xmlType=DXX_TEXT; 

rel->tab[l]->col2xml[3].xmlLevel=2; 

rel->tab[l]->col2xml[4].xml= "Tax"; 

rel->tab[l ]->col2xml[4].col="Tax"; 

rel->tab[ 1 ]->col2xml[4].dataType="reaI"; 
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rel->tab[l]->col2xml[4].xmlType=DXX_TEXT; 
rel->tab[l]->col2xml[4].xmlLevel=2; 
rel->tab[l].numJoin = 2; 
rel->tab[l]->join_info[0]->col = "ojcey"; 
rel->tab[l]->join_info[0]->type = "integer"; 
rel->tab[l]->join_info[0]->numjoin = 1; 
rel->tab[l]->join_iitfo[0]->foreign[0].col = "orderjcey"; 
rel->tab[l]->join_iiifo[0]->foreign[0].^^ = n oider_tab"; 
rel->tab[l]->jom_^ = DXX_NONE; 

rel->tab[l]->join_info[l]->col - "partkey"; 
rel->tab[l]->join_info[l]->type = "integer"; 
rel->tab[l]->join_mfo[l]->niiniJoin = 1; 
rel->tab[l]->join_info[l]->foreigri[0].col = "p_key"; 
rel->tab[l]->join_info[l]->foreign[0].tab = "ship Jab"; 
rel->tab[l ]->join_info[l]->foreign[0].indexInSQLDA = DXX_NONE; 
rel->tab[l]->condition = "price > 2500.00"; 

rel->tab[2].name = "shiptab"; 
rel->tab[2].primary_key_name = "shipdate"; 
rel->tab[2].foreign_key.col = "part_key"; 
rel->tab[2].foreign_key.dataType = "integer"; 
rel->tab[2].foreign_key.xml = "Key"; 
rel->tab[2].foreign_key.xmlType = DXXTEXT; 
rel->tab[2].top_element = "Shipment"; 
rel->tab[2].selectDaPtr = NULL; 
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rel->tab[2].num_col = 2; 

rel->tab[2]->col2xml[0].xml= "ShipDate"; 

rel->tab[2]->col2xml[0].col="date"; 

rel^tabPl-X^OxmlfOl-dataType^'date"; 

rel->tab[2]->col2xml[0].xmlType=DXX_TEXT; 

rel->tab[2]->col2xml[0].xmlLevel=3; 

rel->tab[2]->col2xml[l].xml= "ShipMode"; 

rel->tab[2]->col2xml[l ].col="mode"; 

rel->tab[2]->coI2xml[l].dataType="char(128) ,, ; 

rel->tab[2]->col2xml[l].xmlType=DXX_TEXT; 

rel->tab[2]->col2xml[l].xmlLevel=3; 

rel->tab[2].num_join = 1; 

rel->tab[2]->join_info[0]->col = "p_key"; 

rel->tab[2]->join_info[0]->type = "integer"; 

rel->tab[2]->join_info[0]->numjoin= 1; 

rel->tab[2]->join_info[0]->forigB[0].col = "part_key"; 

rel->tab[2]->join_info[0]->forign[0].tab = "partjab"; 

rel->tab[2]->joinjbofo[0]->foreign[0].indexInSQLDA = DXXNONE; 

rel->tab[2]->condition = "date > *1966-01-01"'; 

rel->num tab 1 es — 3; 

processJRDB_node(rel,dom_node, rdb node, dxx type) 

{ 
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DXXTAB *mytab; 
int i; 



/* It is attributejiode, textnode, or a leaf element node in DAD. */ 
/* Note: table must be specified in RDB jiode */ 

mytab = rel->tab[?] such that mytab.name = table of RDB_node 

if no mytab found 
return error; 

i = mytab->num_col ; 

mytab->col2xml[i]xol=column of RDBnode; 
mytab->col2xml [i] . xmlLevel = current level; 
if ( dxxjype = DXX_DECOMPOSE ) { 

mytab->col2xml[i].dataType= type ofRDBnode; 

} 

if ( it is an attributenode ) { 
mytab->col2xml[i].xml=name of the attribute 
if (dxx_type = DXXJ5ECOMPOSE) 
mytab->col2xml[i].xmlType = DXXATTRIBUTE; 

} 

else if (it is a textjaode) { 

mytab->col2xml[i].xml=name of the parent element node; 

if (dxxjype = DXX_DECOMPOSE) 
mytab->col2xml[i].xmlType = DXXJEXT; 
} else { /* It is a leaf element_node in DAD. */ 

mytab->col2xml[i],xml=name of element_node; 
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if (dxxtype — DXX.DECOMPOSE) 
mytab->col2xml[i] jcmlType = DXXELEMENT; 

} 

mytab->num_col-H-; 
if ( condition != NULL ) 
add condition to mytab->condition; 

} 

Definition of a qualifying parent: 

The parent p of an element qualifies for a table t if all of the following four conditions 
are met; 

1. p exists. 

2. p is intermediate. 

3-p has multioccurrence = YES 

4.p does not have any child mapped to a table other than t 

Intuitively, if a parent qualifies for a table, it is a candidate to be chosen as the top- 
element of the table. 

Definition of top-element: 

Let c be the highest element in DAD that belongs to a table t. 
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[0862] Definition of a Qualifying Parent: 

[0863] The parent p of an element qualifies for a table t if 
all of the following four conditions are met; 

[0864] 1. p exists. 

[0865] 2. p is intermediate. 

[0866] 2. p has multi_occurrence--YES 

[0867] 4. _p does not have any child mapped to a 
table other than t. 

[0868] Intuitively, if a parent qualifies for a table, it is a 
candidate to be chosen as the top-element of the table. 

[0869] Definition of Top-Element: 

[0870] Let e be the higest element in DAD that belongs to 
a table t. 

[0871] If e does not have a parent qualifying for t, then e 
is the top-element of t. 

[0872] Otherwise, the top-element of t is the highest 
element in the parent chain of e that does not have a parent 
qualifying for t. 



findTbpElcmcni(reIvdom_Dode, rdb_node, dxx_type) 
{ 

working variables: 
DXX_TAB •mytab; 



-continued 



int i; 

/* It is attribute_node, text_node t or a leaf clemeat_node in DAD. */ 
/• Note: table must be specified in RDB^node 7 

mytab - rel->tab{?] such that mytab.name - table of RDB_ix>de 
if do mytab found 
return error; 
if ( mytab->top_element — NULL) { 

parent node - parent element__0ode of this dom node; 

grandpare nt_node - parent of the parent__node; 

while ( grandparent node exists && 

grandparent node is intermediate && 

grandparent __node has multiple occurrence Sc& 

grandpare nt node does not have children mapped to 

other tables) { 

parent_node - grandparent node; 

grandparent _node - parent of the grandparent_rjode; 

} 

mvtab->top_clement * name of the parenL_jJodc; 

} 

} 

[0873] Process Root Element 

[0874] This is the second phase which will traverse the 
DOM tree to generate XML document. 



process _Root_element(SQLHDBC hdbc, 
DXX_REL *rel, 
DOMnodc root_element__node, 

char "result tab name, 

char *heading_buf, 
int m, 
int *n) { 

working variables: char[] clemenL^stmt /* temp stmt for element •/ 

DxxBUf outbuf; 

charJJ elcmcnL_name; 

char {} top_sqL_stmt; 

SQLDA •selectPtr, 

DXX_TAB •current_tab 
/* A node is trivial if it is intermediate with muluL_occur — - NO. "/ 

/* Being trivial is the only case when current tab is not needed. */ 

if (root_element is trivial) { 

prepare outbuf; 

append heading_buf to outbuf; 

process_dement_noOe(rxirx;,rcl,NUL^ 
insert outbuff into result_table 
release outbuf 
} else { 
Get element _name; 
Find current_lab such that 

rel->table(i].top clement — clcmcnt_name 
if (current_tab is not found) 

djczFrnanabForIntermedkte(root_dcmcnt_nodc, rcl, &currenl_Jab); 
current_tab->level - 0; 
rel->cunent_lcvel = 0; 

Call gerjcrate_SQ^rel,currcnt_tab,top_sqL_stmtX 

execute top_sql_stmt by hstmt, get data into selectPtr; 
/• loop V 

for ( i = 0; ton; i++) { 
fetch on hstmt; 

current_tab— >sqldaPtr = selectDaPtr, 
cuntnt_tab-> hstmt - hstmt; 
prepare outbuf; 

append heading_buf to outbuf 
process_element_n<xie(hdbc^d,cun-e^ 
insert outbuff into result_table; 



US 2002/0123993 Al Sep. 5, 2002 

51 



-continued 



release outbuf; 
}/* end of while loop •/ 
} /* endif intermediate clement with multi__occurrence — NO */ 
Free hstmt, all allocated space, etc 
> __ 

Find the Table for an Intermediate Element with multi occurrence YES 

cxxFmdT^bForlntermediatc(DOM_Node elementNode, /• input 7 
DXX_REL *rel, /* input •/ 
DXX^TAB "*current_tab) /* output 7 



{ 



Find the Mist current tab by 

traversing down elementNode breadth-first such that 
rcl->table[i].top_element — sub_clcment name 

if (curreat_tab not found) 

Error: no way to generate the multiple occurrences 
because there is no table to fetch from. 



[0875] Generate SQL Statement 

[0876] Generate SQL statement using the rel sturcture 
during the first phase. Two keys there: 

[0877] 1. How to add additional columns to e 
selected in order to pass their values down to low 
level SQL statements; 

[0878] 2. How to add additional predicates to the 
where clause by setting the join value. 



genera tc__SQL_jstmt(rcl, /* relational info 7 
dxx^tab, /• input 7 
sqLstmt) /* output 7 

{ 

working variables: 

charf) select_stmt; 
charfj fromwhere_stmt 
int indexInSQLDA; 
chaiQ [Value; 
inii sql_stmt; 
/* generate select clause */ 
sprintf(sclect_stmt, "select"); 
for (i^>, i<dxx__tab->num„col; i++) { 
sprint£[selcct_slmt "^** ( dxx_tab->col2xmlIi^colX 
dn_tab^>col2xml[i].indexInSQLDA * i; 
} 

/* identify column values need to be passed down 7 
indexInSQLDA - dxx_rab->num_col; 
for Q. - f>,.i <dxx t _tab->num_join; i++) { 

check whether dxx.. tab->join._info(i]-col is in the 
cbat_tab- >col2xmlQ array; 

if not { 

then /* code may need to select it "/ 

for (j-O; j<dxx tab->join info(iJ.num join; j++) { 

check whether the dxx_tab- >join info{i].foreignfj].tab' level is 
less than current_tab->lcvcl; 

if yes: 

strcat(select__stmt dD_tab->join_infofi].col); 

di^rab->join_inf|il->indexInSQLDA - indexInSQLDA; 

indexInSQLSA++; 

break; /* out from inner loop 7 

} 

} 

} /• end of for 7 
/* generate from and where clause 7 
sp rintf (fro tnwhe re_s tint,** fro m %& where %s\n", 

dtx__ table- >oame, dn_table->condition); 

/* identify values got from parent query to be put in the condition 
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This is the join condition "/ 
for (i - 0; i < dxx__lab- > aum_j o in; { 

for Q^y, j < dxx^lab>join_Jnfo[i]jium_join; j++) { 

check whether the dxx_tab->join_Lafo{ij.forcigii|j|.tab , s level is 
greater than currcnl_tab->level; 
if yes then { 
get/vahic(rel,&d3X_tab->join uifctilforeignfjl 

dxx_tab~>oame > dxx_tab->join_info(i].coI,&jV^luc); 

} 

add "AND dxx_Lab->join infbfilcol - jValue"to from where _&tmt; 

} 

} 

strcat(sqL_stint, sclcct__stmt); 
stzcai(Bql_stint, fromwhcrc_stmt); 
8trcpy((irx_tab->sql_8tmt, sql_stmt); 

} 

Example: 

1. select order_Joey, customer_n&me, custo me r_name 
from ordei_tab 

2. select part_Jcey, price, qty, tax from 

part_tab where price > 2500.00 AND order_Jtcy - jV&lue 

3. select date, mode, comments from ship_tab 
where date > 1996-0601 AND part_key - j'Wue 



Get Join VWue 

Get the value from higher level query to from the 
geUvalue(DXX_REL "rel, 

DXX_TAB_COL *join_col, 
char •current_tabname, 
char *current__colname, 
char ••j\Wue){ 
working variables: 

DXX_TAfl •foreign_bib; 
DXX_TAB_OOL *foreign_col; 
int indexInSQLDA; 

1. find forcign_tab - rel->tablc{?] such that its name - 
join col->tab; 

2. find forcign_col-fc«ign_tab->col2xml(?].col, 

such that foreign_tab->col2xin(?)-join_col.col; 

3. if foreign_col is not found 

then find foreign__col-foreign_tab- >join_info( ilforeignfj ).col such that 
foreign_tab->join_infc{i].foreig^J.col-<arrent_ro AND 
foreign_tab^>jom_Jirfo[i].fore 

4. indexInSQLDA - f ore ign_ _co LindexInSQLD A; 

5. get the j"Vaiue from parent_jab->selectDaPtr using indexInSQLDA; 

6. return jAfelue 

> 

Example: 

1. For the 2nd Sql_*tmt: 

0. join_col->col ■ "order_key F * 
join__col->tab - "order_tab** 

1. parent_tab->name - "order tab" 

2. parent col->parciit_tab->info(0].foreign[0] 

parent__col->tab - "order tab" 

parent col->col - "order key" 

3. parcnt_col->indexInSQLDA * 0; 

4. j 'Value can be "1". 



Process Element Node 

Process element node and generate the eletnent_stmt 
p rocess_elemenL_node(hdbc > 
rel, 

current_tab, 

elemenl_node, 

outbuf) 

{ 

working variable: 

SQLDA selectDaPtr; 

int isTableNew - FALSE; 

int hasNotSeenNonAttr - TRUE; 

int isLastChildElement - FALSE; 

int isChildNewTopElement - FALSE; 

int isChildAttrNode - FALSE; 

int isChildlnCurrentTab = FALSE; 

int isChildlnlcrmediatc - FALSE; 

int isCbildTrivial = FALSE; 
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rcl- >current_Jevel++; 

get clement _name for element_node; 

append and elemenl name to outbuf; 

call DOMgctChUdNodcs to get a list of all children 
of this element__node. 
for each child_node do { 
/* Set the boolean variables for determining 
if this child should be generated from 
a table different from the current one. */ 
Set isOiildlntermcdiate to TRUE if child_node does not have any text or attribute child. 

/■ A child is trivial if il is intermediate with multi occur — NO. */ 

/■ Being trivial is the only case when current tab is not needed. 7 

Set isChudTrrvial to TRUE if the child_nodc is trivial, 
if (current_tab !- NULL) { 
set isChildlnCurrenrJhD to TRUE, if the child_node's name equals 
to currenL-tab's top element or in its coI2xml[}; 

}ebe{ 
if(isCluldTrivial) 

isdildlnCUrrcntTab - TRUE; 
else 

isChUdlnCunenflab - FALSE; 

} 

set isChildAUrNode to TRUE if the child_node is the attribute_node; 
if (lisChildlnCurrenfTab) { 
looping rel->tab(] to see whether the child_node is the 

top_elcment of some rel->tab{?); 
if found such ret-tab(?] and rel->tab(?)->sqldaPtr-NULL 
set tsChildNcwlbpElemcnt to TRUE; 
current^. tab » rel->tab(?J 

} 

} 

/* Check if this child should be generated from 

a new table. */ 
/" An element node docs not need a new table 
unless (it is a new top element || 
(currenLJab— NULL && child is not trivial)) 7 
/* An attribute node needs a new table 
only if its value can not be retrieved 
from the current table. */ 
/* A text node never needs a new table. 7 
if (ut ChildNrwTop Element |] 
(current_tab— NULL && tisChildTrivial) [| 
((isChildAttrNode) &.& 

OisChildlDCurrentTab))) /* Handle new table. 7 { 
isTableNew - DXX^TRUE; 
/* Find the right table. 7 
if(! chfldNewTopElement) { 

currc nt__tab - rel->table{i] such that 

currc nl tab— xx>12xmi[ ? Lxml «■ child node's name 

if(currcnt_tab is not found && isChildlntermediatc) 
dxxFind7abForintennediate(child node, rcl, & currc nl_lab); 

} 

if(currcnt tab->sqL_stmtfO>— '\0*) { 

/" Generate, prepare, and execute an SQL query. 7 
currcnt_tab->level - rel->current level 

call generate_SQL(current_tab^ql_sUnt); 

prepare and execute the sql__stmt with hstmt and selectDaPtr; 

} 

current_tab->sqIdaPtr - selectDaPtr; 

current tab—>hsmt - hsmt; 

fetch hstmt; 
}/* End of handling the new table 7 
if the child_node is an attribute_node, do { 

call 

process_att_text_rjode(rel,&child_rx>de,cu^ 
if not last one 

append outbuf with *\"; 
else 

append outbuf with ">"; 

} 

if the child_node is a text node, 

call 

process_aU_text_node(rel,&child_iiode,cwert 
if the child_nodc is an element_node { 
if (isTableNew} { 
/• Loop 7 
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/• Generate possible multiple occurrences of child_node. •/ 

/* The number of occurrences equals the number of rows to be fet c hed. */ 

while (fetch ok) { 

proccM_clcmeat„iKxk(hdbc,rel,current_tab &child_ncpdc,outbuf);( 

fetch on currenl - _tab->hstmt; 

/* NOTE: current_tab->selectDaPtr has new stuff */ 
}/' end of while loop V 
clean/free cunent_tab->sqldaPtr and current_tab->hstmt; 

} 

else { /• same old table V 

/* Generate a single occurrence of child__node. V 

call process element_node(hdbc/el^rrent_Ub,&child_node^}utbuf); 

} 

} /* End of child_node is element 7 
} /• End of loop "for each child" */ rel->current_level~; 
append outbuf with ("</%8>\n"£lcmcoL__iiamc); 

} 



[0879] Process Attribute/Text Node 

[0880] A process attribute or text node which has no other 
child node other RDB_node, generate the XML content for 
attribute and text. 



proccas_att_tert_node(DXX_REL *rel; 

DXX_TAB *current_tab; 

DOMElement'node; 
int type; 

DXXOutBuf "outbuf) 

{ 

working variables: 
chaif] attnamc; 
char[ ] clcName; 
chaif] xmlname; 
charfj value; 
chajf] tabname; 

1. if(typc - DXX^ATTRIBUTE) { 

call DOMgetAttribute to get the attname, 
xmlname — attnamc; 
} 

else { /* text node 7 

find the name of its parent element__node, call it eleName; 

xmlname — clcName; 

} 

2. call DOMgetChild to get the RDB_node 
tab name - RDB ^nodes' table name 

co Lname - RDB__nodcs* column name 

4. if (tab name « current_tab->name) 

call get Wuc(currcnt_tab,xmlnanie,& value); 
else { 

loop rel to find rel->tab{?] such rel->tab{?}->name-4abuame; 
if found, 

call getValue(rel->tab[?]„Tmlname,&value); 
else { 

internal error; 

} 

5. if (type - DXX^ATTRIBUTE) { 

append oulbuf with ("%s-%s", attname,value); 
else 

append outbuf with value); 

} 



Get Value 

Get the value as the column value from relational table, 
get Value (DXX_TAB •curreDi_tab, 
char 'nodenamc, 
char •attr_value) 

{ 

working variable: 



-continued 

int indcxfNSQLDA; 
for ( i - 0; i < current_tab->num_col; i++ ) 

if( cunrnt_tab->col2xml[t]jKnl - nodename ) { 
mdcxInSQLDA-^unent_col2xmlli \ indexlnSQLDA; 
get attr_value from current_tab->selectDaPtr 

using indexlnSQLDA; 
break; 

} 

} 

} 



G. A Technique to Store Fragmented XML Data 
into a Relational Database by Decomposing XML 
Documents with Application Specific Mappings 

[0881] This invention presents a technique which stores 
fragmented XML data into relational database tables by 
decomposing XML documents with application specific 
mappings. The mapping is based on the XML Path Lan- 
guage (Xpath) data model. Using this invention, a user can 
shred XML documents into new or existing database tables. 
This makes a relational database a repository of fragmented 
XML data. 

[0882] The technique parses an incoming XML document 
to be decomposed and parses an XML formatted Data 
Access Definition (DAD) with application specific mapping 
based on the XPath data model, generating two Document 
Object Model (DOM) trees. The DAD identifies relational 
tables and columns. One DOM tree is an XML document 
DOM tree and the other is a DAD DOM tree. The technique 
then works on both DOM trees to map data in the incoming 
XML document DOM tree to columns in relational tables, 
according to the DAD DOM tree. 

[0883] Additionally, the technique identifies different rela- 
tional levels and XML levels. Next, the technique generates 
SQL insertion statements based on the relational level, while 
taking data from a list of multiple occurrence XML element 
trees in the same XML level. Additionally, optimization and 
recursion techniques are used. 

[0884] FIG. 11 is a flow diagram illustrating the steps 
performed by the XML System to decompose XML docu- 
ments with application specific mappings. In block 1100, the 
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XML System receives an XML document containing XML 
data. In block 1102, the XML System parses the XML 
document to generate an XML Document Object Model 
(DOM) tree. In block 1104, the XML System receives a data 
access definition (DAD) that identifies one or more rela- 
tional tables and columns. In block 1106, the XML System 
processes the DAD to generate a DAD Document Object 
Model (DOM) tree. In block 1108, the XML System maps 
data from the XML DOM tree to columns in relational tables 
according to the DAD DOM tree. 

[0885] G.l Decomposing an XML Document into an 
XML Collection 

[0886] Decomposition refers to breaking down the data 
inside of an XML document and storing it into one or more 
relational tables. The data stored is basically un-tagged. The 
XML System provides stored procedures to decompose 
XML dau from an XML document. A user always needs to 
define a DAD for the decomposition. The user may enable 
an XML collection with a DAD first, then use the stored 
procedures. For example, when decomposing XML docu- 
ments into new tables, an XML collection must be enabled 
so that all tables in the XML collection can be created by the 
XML System. For some reason, a user may want to use 
existing tables to add additional data from incoming XML 
documents. In this case, the user needs to alter the tables to 
make sure the columns specified in the DAD exist in the 
tables. The enable collection operation will check this. If the 
user does not enable the XMLcoUection, the user must pass 
the DAD to the stored procedure. The sequence order of 
element or attribute of multiple occurrence will be reserved, 
only for tables created by the XML System. 

[0887] G.l.l Specifying an Xcollection 

[0888] In the DAD, a user still needs to specify the 
Xcollection. 

[0889] G.l.1.1 Mapping Scheme in XML Collections 

[0890] The mapping between composed/decomposed 
XML documents and an XML collection is specified in the 
Xcollection of a DAD. The XML System adapts the notation 
used in XSLT and uses a subset of it to define the XML 
document structure. In order to facilitate the mapping, the 
XML System introduces the element Relational DataBase 
node (RDB_node) to the Xcollection. 

[0891] The DAD defines the XML document tree structure 
using seven kinds of nodes defined by XSLT/XPath: 



[0892] 


root_node 


[0893] 


element_node 


[0894] 


text_node 


[0895] 


attribute_node 


[0896] 


namespacc_node 


[0897] 


processing_instruction_node 


[0898] 


comment_node 



[0899] For simple and complex compositions, the RDB_ 
Node is used to define where the content of an XML element 
or value of an XML attribute is to be stored or retrieved. 



[0900] The RDB_Node has the following components: 

[0901] Table: the name of the relational table or 
updateable view, in which the XML element content 
or attribute value is to be stored. 

[0902] Column: the name of the column which con- 
tains the element content or attribute value. 

[0903] Condition: the predicates in the WHERE 
clause to select the desired column data. 

[0904] In the definition of an Xcollection, for this embodi- 
ment of the invention, the following approach is used to 
define the mapping: 

[0905] RDBnode Mapping: Specify RDBnode for 
element_node, text_node and attribute_node in the 
XML data model. In this case, the XML System will 
generate SQL statements based on the RDB_nodes 
and document tree structure. 
[0906] The text_node and attribute_node will have a one- 
to-one mapping to/from a column in a relational table. 
Therefore, each of them will have a RDB node to define the 
mapping, where the RDB node is needed for the RDBnode 
mapping. It is possible that an element_node has no text- 
_node but only child element_node(s). 

[0907] Using a RDB node to specify an element_node, 
text_node and attribute_node is more general For an 
element_node, only the root element_node needs to have a 
RDB node. In this RDB node, the user is required to 
specify all tables used to compose/decompose data, as well 
as a join condition among these tables. The condition 
predicate in this RDB node will be pushed down from the 
root element node to all child nodes. Ideally, the way to tie 
all tables together within an XML collection is the primary- 
foreign key relationship. However, it often happens that 
some existing user tables do not have such a relationship. 
Therefore, requiring the foreign key relationship for com- 
position is too restrictive. However, in the case of decom- 
position, if new tables are created for storing the decom- 
posed XML data, then the DAD requires a user to specify the 
primary key of each table, as well as the primary-foreign key 
relationship among tables. 

[0908] G.l. 1.2 Sample DADs for XML Collections 

[0909] Assuming the XML documents need to be com- 
posed or decomposed are like the one shown in the example 
above, the following sample DAD shows how to define the 
mapping from relational tables using RDB_node Mapping. 
In particular, the following example DAD shows how to 
compose/decompose a set of XML documents from/into 
three relational tables while using the RDB_node to specify 
the mapping. 



Litem__DAD3.dad 
<?xml version— "1.0"? > 

<!DOCTYPE Order SYSTEM "E:Vdtd\dad.dt£r> 
<DAD> 

<dtdid>E :\dtd\lineltem.dtd</dtdid> 
<vali dado n> YES </validation> 
<XcoUectioa> 

<prolog>7xmI vcreioo-"1.0"?</prolog> 

<doctypc>!DOCTYPE Order SYSTEM " E:\dId\lincItem.duT </doctype> 

<root_jK>de> 

<elcment_nodc aame="Ordcr"> 
<RDB_Qodc> 
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-continued 



<table aame-**order_tab" key-"order_Jcey"/> 

<tablc name-"part_tab" kcy-**part kcy"/> 

<table name-"ship_tab" key»**datefcf/> 
<cooditicm> 

order_tab.order_Jcey - part_tab.order_Jtey AND 

part_tab.pait key « ship_tab.part key 

</condiiion> 
</RDB_oode> 

<attribute_node name-"Key"> 

<RDB_node> 

<table name- - order_tab"/> 

<oolumn name-"otder_Jcey" type-** integer"^ 

</RDB_node> 

</attribute__node> 

<eleraenI_node name-"Customer"> 

<tcxt_oodc> 

<RDB_node> 

<table aame-**order_tab7> 

<column name-"cratomer'* type-"char(128y7> 

</RLDB_aode> 

</text_node> 

</element_node> 

<elemeot_node name-"Part"> 

<attribuie node name— "Key"> 

<RDB_ J node> 

<table name-**part_tab*7> 

<coLumn namcV*part Jcey" type»"integcr"/> 

</RDB_Dodo 

</attribute__node> 

cclcment node namc- M Quaiitity n > 

<text_node> 

<RDB_node> 

<tablc namc-"pait_tab"/> 

<column name-"qty" type-**integei^/> 

</RDB_node> 

</text_j»de> 

</elemcnt_nodc> 

<clcmeoL_nodc name-~ExleodedPrice"> 

<text_nodc> 

<RDB_node> 

<table name-**part_Jab*7> 

ccolumn name-"price" typc-"rcal'*/> 

<coodition> 

price > 2500.00 

</coodiiio[i> 

</RDB_node> 

<ftext_aode> 

</elemeat_node> 

<rlcmcnl_node name-"Tax"> 

<tcxt_nodc> 

<RDB __node> 

<table name-'*part_tab"/> 

<column name-**tax" type- - real "/> 

</RDB_Dode> 

</tcxt_node> 

</c lc me at node > 

<clcmeni__node name— "shipment^ 

<elemenl„node name—**ShipDate"> 

<text_joode> 

<RDB_node> 

■dable name— **ship_tab'7> 

<column name- 4 * date" type-"date'7> 

<conditioQ> 

date > -1966-03 -01" 

</coodition> 

</RDB_j»de> 

</text_ Dode> 

</ele ment _node> 

<element_node name»**SbipMode"> 

<text_node> 

<RDB_node> 

•ctable Dame— "ship _tab**/> 

<column name-" mode" type«*:har(12Q)7> 

</RDB_Dode> 

</text_node> 

</clement_node> 



-continued 

<element__node aame- M Coinmeof*> 

<text_jiodc> 

<RDB _node> 

<table name— "ship tab"/> 

<column name-" comment" type— "Varcbai(2k)7> 

</RDB_node> 

<AcxL_Dode> 

</clement_node> 

</clcment nodex! — end of element Shipment* 

</clemcnt nodex! — end of element Part — > 

</clemcnt_node><! — end of clement Order — > 
</root__nodc> 
</Xoollection> 
</DAD> 



[G>910] G.1.2 Defining Xcollcction for Decomposition in 
DAD 

[0911] One DAD can used for both composition and 
decomposition. For decomposition, additional information 
is required to be specified in the DAD, however this infor- 
mation is just ignored when the DAD is used for composi- 
tion. 

[0912] The additional information needed for decomposi- 
tion is described below: 

[0913] Primary Key for each table in the RDBjiode for 
the root clcmcnt_oode 

[0914] In order to tie XML collection tables together, the 
primary-foreign key relationship is preserved among the 
tables. The primary key can consists of a single column or 
as a composite one consisting of multiple columns. The 
primary key must be specified for each table. A user does so 
by adding the attribute key to the table element of the 
RDB_node. In the following example, the RDB_node of the 
root element_node "Order" is defined as: 

[0915] <element_node name-"Order"> 

[0916] <RDB_node> 

[0917] <table name- w order tab" key- w order 
key7> 

[0918] < table name- a part_tab w key-"part_key, 

price7> 

[0919] <table name-"ship_tab" key-"date7> 

[0920] <condition> 

[0921] orcierJab.oiderJcey«part_tab.order_key 
AND 

[0922] part_tab.part_key-«hip_tab.part_key 

[0923] </condition> 

[0924] </RDB_node> 

[0925] with the keys specified. In the above example, the 
primary key of part_tab is a composite one. 

[0926] Data Type of the Cbhimn 

[0927] In order to create the right data type for each 
column when creating new tables during the enabling XML 
collection process, the column type must be specified for the 
RDB_node for each attribute_jiode or text_node. This is 
because the real data is mapped from an XML attribute value 
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or element text to relational columns. A user does so by 
adding the attribute type to the column element. In the 
following example, the RDB_node of attribute_node "Key** 
of the eIement_node "Order" is defined as: 

[0928] <attribute_node name-"Key"> 

[0929] <RDB_node> 

[0930] <table name-"order_tab7> 

[0931] <column name- u order_key" type- u inte- 
ger7> 

[0932] </RDB_rjode> 

[0933] </attribute_node> 

[0934] where the type is defined as integer. 

[0935] G.1.3 Enabling an XML Collection 

[0936] The purpose of enabling an XML Collection for 
decomposition is to parse a DAD, create new tables or check 
the mapping against existing tables. The DAD is stored into 
the XML_USAGE table when the XML Collection is 
enabled. 

[0937] When a user prefers to have the XML System 
create collection tables, the user should enable the XML 
collection. Additionally, the enablement depends on the 
stored procedure the user chooses to use. The stored proce- 
dure dxxInsertXML( ) will take XML Collection name as 
the input parameter. In order to use the stored procedure 
dxxInsertXML( ), the user must enable an XML collection 
before calling it. The user can call stored procedure dxx- 
ShredXML( ) without the enabling of an XML collection by 
passing a DAD. In the later case, all tables specified must 
exist in the database. 

[0938] G.l.3.1 Enabling XML Collection Option 

[0939] For composition, an XML collection is not required 
to be enabled. The assumption is that all collection tables 
already exist in the database. The stored procedure can take 
a DAD as an input parameter and generate XML documents 
based on the DAD. On the other hand, the composition is the 
opposite of the decomposition. For XML collections enabled 
during the decomposition process, the DAD is likely to be 
used to compose XML documents again. If the same DAD 
is used, then the collection can be enabled for both compo- 
sition and decomposition. 

[0940] An XML Collection can be enabled through the 
XML System administration GUI (graphical user interface) 
or using the dxxadm command with the Enable_collection 
option. The syntax of the option on a DB2 server is as 
follows: 

[0941] dxxadm enable_collection dbname collection- 
jaame DAD file [-t tablespace] 

[0942] where 

[0943] a. db_name: name of the database. 

[0944] b. collection_name: name of the XML collec- 
tion, which will be used as the parameter to the 
stored procedures. 

[0945] c. DAD_file: Data Access Definition(DAD) 
file. 



[0946] d. tablespace: Optional. The tablespace con- 
tains tables in the XML collection. If new tables need 
to be created for decomposition, then new tables will 
be created in the specified tablespace. 

[0947] The following is an example of enabling the XML 
collection called sales_order in database mydb with the 
DADJile Iitem_DAD3.dad. 

[0948] /home/ul>dxxadm enable_collection mydb sale- 
s_order Litem_DAD3.dad 

[0949] DXXA009I XML System is enabling collection 
sales_order. Please wait. 

[0950] DXXAO010I XML System has successfully 
enabled the collection sales_order. 

[0951] /home/ul> 

[0952] The enable_collection option mainly does the fol- 
lowing things to a database: 

[0953] e. Read the DAD file, call XML parser to 
parse DAD, and save internal information for map- 
ping. 

[0954] f. Store internal information into the 
XML_USAGE table. 

[0955] The option is good for performance and is usually 
helpful to perform composition and decomposition using 
one DAD. 

[0956] G.l.3.2 EnablejcoUection Option 

[0957] The enable_collection option enables an XML col- 
lection associated with an application table. The association 
between the application table and the side table specified by 
the DAD is through the root_id. 

[0958] Syntax 

[0959] dxxadm enable__oollection db_name collection 
DAD_File [-t tablespacej-l login][-p password] 

[0960] Argument 

[0961] g. db_name: the database name. 

[0962] h. collection: the name of an XML collection. 

[0963] i. DAD_File: the file containing the DAD. 

[0964] j. tablespace: Optional. The tablespace con- 
taining a user table specified in the DAD or side table 
created by the XML System. 

[0965] k. login: Optional. The user ID, which is only 
needed if the command is invoked from a DB2 
client. 

[0966] 1. password: Optional. The password, which is 
only needed if the command is invoked from a DB2 
client. 

[0967] The enable_collection option will enable an XML 
collection. The enablement process is to parse the DAD and 
prepare tables for XML collection access. It takes the 
database name, a name of the XML collection, a DAD_File 
and an optional tablespace. The XML collection will be 
enabled based on the DAD in the DAD_File. It checks 
whether the tables specified in the DAD exist. If the tables 
do not exist, the XML System will create the tables accord- 
ing to the specification in the DAD. The column name and 
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data type is taken from the RDB_node of an attribute_node 
or text_node. If the tables exist, the XML System will check 
whether the columns were specified with the right name and 
data types in the corresponding tables. If a mismatch is 
found, an error will be returned. The tablespace is optional, 
but it is specified if the collection tables are to be created in 
a tablespace other than the default tablespace of the specified 
database. 

[0968] The enable_collection is required for decomposi- 
tion stored procedure dxxInsertXML( ), and its pairing 
dxxRetrieveXML( ), and the dxxUpdateXML( ). For stored 
procedure dxxGenXML( ) and dxxShredXML( ) which take 
a DAD as input, the enablement of an XML collection is not 
required. For the latter stored procedures, it is assumed that 
all tables specified in the DAD exist in the database already. 
If they don't exist, an error will be returned. The cnable_c- 
ollection does have a pairing disablecollection option. But 
the operation of disable_coUection is much simpler. It just 
deletes the collection from XMLJJSAGE table. 

[0969] As discussed above in Section E, FIG. 7 is a 
diagram illustrating code organization to compose XML 
documents. The dxxGenXML( ) stored procedure 700 and 
the dxxRe trie ve XML( ) stored procedure 702 each invoke 
the dxxComposeXML( ) module 704. Decision block 706 
determines whether RDBjnode mapping is to be performed. 
If RDB_node mapping is not to be performed, processing 
continues with SQL to XML module 708 and the SQL to 
XML mapping is performed by the SQL to XML mapping 
technique 710. If RDBnode mapping is to be performed, 
processing continues with the dxxGenXMLFrom RDB( ) 
module 712 and RDB node mapping is performed by the 
RDBR technique. 

[0970] G.1.4 Using Stored Procedures for Decomposition 

[0971] The decomposition of XML documents from an 
XML collection is performed through the use of stored 
procedures. The XML System provides the following stored 
procedures to compose documents. 

[0972] dxxInsertXML( ) 

[0973] The dxxInsertXML( ) takes two input param- 
eters, the name of an enabled XML collection and the 
XML document to be decomposed. It returns two 
output parameters: an error code and an error mes- 
sage. 

[0974] The stored procedure dxxInsertXML( ) inserts 
an input XML document into an enabled XML 
collection which is associated with a DAD. The 
collection tables and the mapping are specified in the 
DAD. During the enabling process, the XML System 
has already checked or created the collection tables 
according to the specification of Xcollection. The 
stored procedure dxxInsertXML( ) will decompose 
the input XML document according to the mapping 
specified in the DAD and insert un- tagged XML data 
into the tables of the named XML collection. 

[0975] Stored Procedure Declaration: 



djcdnscrtXML(chaf(UDB_SIZE) collcctionNamc, /• input */ 
CLOB AS LOCATOR xmlobj, /* input 7 



-continued 

long returnCode, /* output 7 
varchar(1024) retumMsg) /* output 7 



[0976] Parameters: 

[0977] collectionName: IN, the name of an enabled 
XML collection. 

[0978] xmlobj: IN, an XML document object in 
CLOB LOCATOR type. 

[0979] returnCode: OUT, return code in case of error, 

[0980] retumMsg: OUT, message text in case of 
error. 

[0981] The following is an example of this stored proce- 
dure call. 



EXEC SQL INCLUDE SQLCA; 

EXEC SQL BEGIN DECLARE SECTION; 

char collection! 64]; /* name of an XML collection */ 

SQL TYPE is CLOB__fTLE xmlDoc; /• input XML document */ 

long returnCode; /* error code */ 

char returnMsgf 1 024]; /* error message text V 

short coilection_ind; 

short rmlDoc._ ind; 

short rcturnCode_Jnd; 

short retumMsg_ind; 

EXEC SQL END DECLARE SECTION; 

/• im'ri»liyjt host variable and indicators */ 

strcpy(collcction,'sales_order") 

strcpy(xnriobj.name/e:\xmr\ordcrl.xmr); 

xmlobj juunc_lcngth^tilcn("c :\xmJ\order 1 jtml"); 

xmlobj ^e_c^tion-SQL_FILE_READ; 

returnCode * 0; 

returaMsg[0] - VT; 

collection_ind *■ 0; 

xmlobj _ind ■ 0; 

returnCode ind - -1; 

rcturnMag Lnd - -1; 

/• CaJl the stored procedure "/ 



[0982] EXEC SQL CALL 

o^2xml!dxxInsertXNfL(:collection:collection_ind; 

[0983] :xmlobj:xmlobj_ind, 

[0984] : return Code:returnCode_ind, :returnMsg:re- 
turnMsg_ind); 

[0985] If the XML collection sales_order is enabled with 
Litem_DAD3. dad, then the dxxInsertXML( ) call will 
decompose the input XML document "e:\xmI\orderLxml" 
and insert data into the sales jorder collection tables accord- 
ing to the mapping specified in Litem_DAD3.dad. 

[0986] dxxShredXML( ) 

[0987] The stored procedure dxxShredXML( ) works 
as same as the stored procedure dxxInsertXML( ) 
except that it takes a DAD as the first input parameter 
instead of a name of an enabled XML collection. 
Therefore, it can be called without enabling an XML 
collection. 

[0988] The stored procedure dxxShredXML( ) inserts 
an input XML document into an XML collection 
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according to the Xcollection specification in the 
input DAD. If the tables used in the Xcollection of 
the DAD do not exist or the columns do not meet the 
data types specified in the DAD mapping, an error 
will be returned. The stored procedure dxx- 
ShredXML( ) decomposes the input XML document 
and inserts un-Ugged XML data into the tables 
specified in the DAD. 

[0989] Stored Procedure Declaration: 



dxxShredXML(CLOB as LOCATOR dad, /* input 7 
CLOB as LOCATOR xmJobj, /• input 7 
long *rctumCode, /* output 7 
varchar(1024) 7cturnMsg) /* output 7 



[0990] Parameters: 

[0991] dad: In, a DAD in CLOB Locator type, 

[0992] xmlobj: IN, an XML document object in 
CLOB LOCATOR type. 

[0993] returnCode: OUT, return code in case of error, 

[0994] retumMsg: OUT, message text in case of 
error. 

[0995] The following is an example of the dxxShredXML( 
)call. 



EXEC SQL INCLUDE SQLCA; 

EXEC SQL BEGIN DECLARE SECTION; 

SQL TYPE is CLOB_JjOCATOR dad; /• DAD7 

SQL TYPE is CLOB_FTLE xmlDoc; /* input XML document 7 

long return Code; /* error code 7 

char ietumMsg[1024l /* error message text 7 

short dad _jnd; 

short xmlDoc_ind; 

short retumCode_ind; 

short returnMsg_Jnd; 

EXEC SQL END DECLARE SECTION; 

/* initialize host variable and indicators 7 

/• make dad as the CLOB_LOCATOR of dad 7 

strcpy(xmlobj.riaine/e:VariJ\orderl.xinl'); 

rmlooj.name Jength-stilen("e:\nnI\order 1 .xmT); 

xmlobj\fiJe_option-SQL_FILE_READ; 

returnCode - 0; 

retumMsg(0] - \0'; 

dad_ ind - 0; 

xmlobj_ind - 0; 

retumCode_ind - -1; 

retumMsg ind - -1; 

/• Call the stored procedure 7 

EXEC SQL CALL db2rmJldxxShredXML(:dad:dad_ind; 
□nnlobj :xrnlobj_ind, 

:rcLurnCodc:rc turn Code ind, :returaMsg:rcturnMsg__ind); 



[0996] If the content of DADbuf has the 
Litem_DAD3.dad content, then the dxxShredXML( ) call 
will decompose the input XML document 
"e:\xml\orderl.xmr and insert data into the salesjorder 
collection tables. 

[0997] G.2 Example Decomposition 

[0998] The following illustrates an example of decompos- 
ing an XML document into a relational database using a 
RDB node in the DAD. 



XML Document to be input: 
<?xml version-* 1.0" ?> 

<!DOCTYPE Order SYSTEM "E:\dxi\tcst\dtd\l tic rn.dtd"> 
<Order Key-"1"> 
<Customer> 
<Name>General Motor </Name> 
<Emafl>parts@gxn.com<^Ernail> </Customer> 
<Part Color- " red "> 
<Key>68</Kcy> 
<Quantity>36</Quantity> 
<ExtendedPrice>34850.1 6 </ExlendedPrioe> 
<fIax>0.06</Tai> 
<Shipment> 
<ShipDate>1998-08- 19</ShipDate> 
<ShipMode>BOAT</ShipMode> 
</Shipment> 
<Shipment> 
<ShipDate>1998-08-20</ShipDate> 
<ShipModc>AIR</ShipMode> 
</Shipment> 
</Pait> 

<Paxt Color-"rcd"> 
<key>128</key> 
<C*^tiiy>28</Quantity> 
<Exten(kdrMce>38000.00</Extena£dPrice> 
<Tax>0.07</Thx> 
<Shipment> 

<I- This shipment will not be inserted. — > 

<ShipDate>1961-01-01</ShipDate> 

<ShipMode>BOAT</SnipMode> 
</Shipment> 
<Shipment> 

<ShipDate>1998-12-30</ShipDate> 

<ShipMode>TRUCK</ShipMode> 
</Shipment> 
</Part> 
</Ordcr> 

Document Access Definition (DAD): 
<?xml version-"1.0"?> 

<!DOCTYPE Order SYSTEM "E:Wtd\dxxoaddtd"> 
<DAD> 

<dtdid>E:\4td\Uneltem.dtd</dtdid> 

<validation>YES</validatiori> 

<Xcollection> 

<root_jiodc> 

<elcmcnt_node name-'Oider"> 
<RDB_node> 
<table name-"order_tab" key-"order_Jcey7> 
<table name-'parLJab" key-"part_Jcey7> 
<table name-"ship_tab" key»"sbip_date7> 
<condition> 
order_tab.order_key-part_tab.o _Jtcy AND 
part_tab.part _Jzy«ship_Jab.p_jEey 
</condition> 
</RDB_nodc> 

ottributc node name-" Key" > 

<RD B_nodc> 
<tablc name-"order_tab"/> 
<column name-*ordcr_Jcey" type-"integer"/> 
</RDB_nodc> 

</attributc nodc> 

<elcment_Dode name— "Customer"> 
<elcmcnL_node name»"Name"> 

<text node> 

<RDB_node> 
<table name-»'order_tab7> 

•ecolumn name-'customer_name" typc-'char(128)7> 
</RDB_node> 
</tcxt_node> 
</element_j»de> 
<elcmcnt _node name—" Email "> 
<texL_Dode> 
<RDB_Dodc> 
•ctablc name-"order_JabV> 

<oolumn name=*customer_email" type='char(128)7> 
</RDB__Dode> 
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<yicrt _jiode> 

</element node> 

</eletncDt_node> 
<clement_node name-' Part" > 
cattribute _Dodc namc- a Color"> 
<RDB_node> 
<table name-'part_tab7> 
<cohnnn name-"color7> 
</RDB_nodc> 
</attributc__nodc> 
«xlement_nodc name—* Key" > 
<Iext L _node> 
<RDB_node> 
<table namc—"part_tab7> 
<cotumn name— ■part.Joey" type-"integer7> 
</RDB_nodo 
</text__node> 
</eIement__node> 

<element_node namc-"ExtcndedPricc*> 

<text_node> 
<RDB_node> 
<table name-"part_tab7> 
■ccoiumn name— "price" type-' real 7> 
<conditioa> 
price > 2500.00 
</condilkm> 
</RDB_aodo 
</text_nodc> 
</elcmcnt_aode> 
<clcmcnt _£ode name— Tax" > 
<text_node> 
<RDB__nodc> 
<tabk name— "part_tab7> 
<cohimn name- "tax" type— "rea!7> 
</RDB_node> 
•^text_node> 

</element nodc> 

•eelement node name* "Quantity "> 

<text_oodc> 
<RDB__nodc> 
<table name-"part_tab7> 
<column name-"qty" type— "inieger7> 
</RDB_node> 
</text_nodc> 
</clcmcnt node> 

<clcmcat_oode name-"shipment'> 
eelement __node name— "ShipDatc'> 
<text_node> 



<RDB_nodc> 

<table name* "ship tab"/> 

ccolumn name— "date" type— "datc"/> 

<conditioa> 
date > '1966-01-01' 

</COOditiDQ> 

</RDB_node> 
</text_node> 
</eleinent_node> 

<eleraent node name— "Ship Mode" > 

<tcxt_nodc> 
<RDB_node> 
<tablc nflnv^'^h' p twh 7^ 
<cohimn name-" mode" type-"char{128)7> 
</RDB_Bodc> 
<Aext_node> 
</element_node> 

<elemenL_node name-"Comments"> 
<text_node> 
<RDB_jiode> 
<table name— "ship_tab"/> 

<cohinm name— "comments" type-"varchar(2000)7> 
</RDB_node> 
</text_node> 
</elemcnt_node> 
</clcmcnt__oodc> <!— end of element Shipment — > 
Vclcment_Dode> <!— end of element Part — > 
</eleincnt_Dode> <!— end of element Order — > 
</root__node> 
</Xcol lection > 
</DAD> 



[0999] G.3 Detailed Techniques 

[1000] The following discussion focuses on the technique 
for one embodiment of the invention. In particular, the 
following includes pseudocode and data structures used by 
the technique. 

[1001] G3.1 Data Structures 

[1002] The following data structures are used for decom- 
postion of XML documents using RDB-nodes. 



f * 

Column to XML node pair 

. */ 

typedef struct col2xml { 

char col[DB2_TAB_COL_VIEW_LEN]; 

char dataType{ D B2_TAB_CO L_ VIEW_JLEN+8 \> l m data type of the column V 
char xml(DXX_XML_FIELD_SIZE]; /* xml can be attribute or element name •/ 
inl xmrrype; /* DXX_ATTRIBUTE or DXX_TEXT 7 
int xmlLcvel; /* for determining if a column 

needs to be copies to the 

next row */ 
int indexInSQLDA; 
} DXX__COL2XML; 

/• 

* Table and Column pair 

* 7 

typedef struct tabcol { 

char tab(DB2_TAB_COL_VIEW_LEN); 

char colIDB2„TAB_COL_VT£W_LEN]; 

int indexInSQLDA; 

} DXX_TAB_COL 

/• 
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* Join information 

. V . 

typedef struct join__info { 

char coll DB2_TAB_OOL_VIEW_LENt 

/* column used in the join condition 7 
int num__join; /* number of foreign columns to be joined */ 
DXX_TAB_COL forcignpXXJ^UM_PORHGrV]; 

/* foreign column and its table to be joined */ 
} DXX_JO(N_JNFO; 

/• 

* Primary key Information: 

• 7 

typedef struct pri key { 

int nunv_col; /* number of columns in the key */ 

char oame(DXX^jsnJM_MAPPINGlDB2_TAB„COI^VIEW __LENJ 

/* the column names of the primary key */ 
} DXX_PRIKEY; 

/*— 

* Foreign key Information: 

. m f 

typedef struct for_Jcey { 

int num__col; /* number of columns in the key 7 
DXX_COL2XML col[DXX_NUM_MAPPING); 

r the columns of the foreign key 7 
} DXX_PORKEY; 

/• 

* Table Information: 

•/ 

typedef struct tab { 

char name(DB2_TAB_COU_VIEW_J J Et^l /* name of the table 7 
DXX_PEUKEY pri„kcy; /* primary key 7 
DXX__FORKEY for_key; /* foreign key */ 
int level; /* relational level of the table */ 
char top_elemeiu{DXX^JCMl t _FIELD_^IZE]; 
/* the highest level of XML element using 

* column data of this table 7 

char sqLstmt(MAX_STMT__LENl; /" SQL statement 7 

SQLHSTMT hstmt; /* CLI statement handle */ 

struct sqlda 'aqldaPtr, /* pointer of selected data 7 

int num_col; /* number of columns which data used in the 

* XML document */ 

DXX_OOL2XML col2xml(DXX_NUM_MAPPINGt 
/* mapping between column in this table to 

* XML attribute or element text 7 

int nunx_join; i* number of columns in this table which 

" will form join conditions 7 
DXX_JOIN_JNFO join_Jnfo(DXX__NUM_JOIN_XN_TABLEi /* join information 7 
char (»ndition[DXX_CX>NDrnON_IN_REL]; 
/* condition of select for columns in this 

* table •/ 

} DXX_TAB 

/* 

* Relation Information for entire XML documents 

* */ 

typedef struct rel { 

int current _Jevel; /* the current XML level during treversal */ 

int num_tables; /* number tables to generate/decompose XML doc*/ 

DXX_TAB *tab[DXX_NUMTAB_IN_REL; ]f * details of each table 7 

char **top_elementa /" index to tab[i]->top__element for fast search. */ 

SQLHDBC hdbc; /* CU connection handle 7 

} DXX_REL; 

/• 

* Data structure to store a row 

* 7 

typedef struct row { 
int num_coI; 

char **coldata; /* Array of pointers V 
} DXX__ROW; 

/• „ 

" Data structure to store the rows 

• _ 7 

typedef struct rows { 
int num_rows; 

DXX__ROW • ruw{DXX_MAX_ROWS]; 
} DXX__Rows; 
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[1003] The following routine will enable a column. 
[1004] enableCoU(dadbuf,tablespace,sqlstate, msgtext) 

[1005] dadbuf a memory buffer containing 

a document ac 

[1006] sqlstate OUT: the SQL state in case 

of errors 

[1007] msgtext OUT: message text in case 

of errors 

[1008] Note that in the following routine, the sqlstate and 
msgtext are ignored in one embodiment. 



DOMElement dad; 
DOMEIement dad__root_element; 

DXX-jlEL relation_Jnfo; 
DXX-KEL *rel - &relarion_Jnfo; 
I* Parse the DAD and prepare the SQL query */ 
parse dad_buf to get the DOM tree of DAD; 
dad_root_elcment - the first clement __node of dad; 
/* init relational structure */ 



-continued 



rel->current level - 0; 

rel->num_tables - O, 
/* set relational structure */ 

&etRcl(rel,dad,DXX_DEOOMPOSE); 
/* process relational information */ 

process^l(rcl,dad_root_elemenl/),DXX__DECOMPOSE>, 
/* prepare_tables */ 
prepare tables(rcl, tables pace); 
/• Register collection in XML_usage table */ 
insert a new row into XML_usage table, using collection name; 



[1009] G3.2 Setup Relational Structure 

[1010] This routine will process the RDB node of the top 
element code and initialize the REL structure of entire XML 
documents. After the process, all tables involved in com- 
posing/decomposing XML documents should be included in 
the REL structure, and the relationship between tables 
should also be recorded in the REL. 



setRel(reLdad, dxx__type) { 
working variable: 

DXX_TAB *currcnt_tab; 
get the RDB__node of the top_element__node; 
if no RBD_node 

return error; 
for each table in RDB node do { 

allocate space for currenL-tab; 

CTirrent„tab->namc - RDB node's table name; 

if (dxx_type — DXX_DECOMPOSE) 
curTent_tar>>piimary_key_name - RDB_node's table key; 

current__tab->top ..element - NULL; 

current_tab->level - 0; 

current_tab->niun_col - 0; 

current_tab->selectDaPtr - NULL; 

current_tab->condition - NULL; 

if ( RDB_node's condition !- NULL ) 

scan the condition of RDB_node, 

currents tab->aum_Join - 0; 

for each currcnL_tab.coIumn in the predicate do { 
i - cunent_tab->num_joio; 

cuncnt_tab - >join__inf o( i J.co 1 - currat__tab .column; 

cu rrent_tab- >jo ia_infb( i].num_g o in - 0; 

for each other_tab.col - current tab.column do { 

j - current tab->join info{i].mim join; 

ciirrcnt_tab->join__infofi].foreign[j].col « col in other tab 

current_tab->join_jnfo{i3.foreignrj].tab » other_tab 

oment_tabojoin_Jnfo{i].forei^ 

current_tab->join_in fo[iJ.num_join++; 

} 

current tab->num joirn-f; 
} 

rel->tab(i] - current_tab; 
rel->num_tables++; 

} 

Example: 
rel->tab[0].name - "order__tab"; 
rel->tab[0].primajy_jcey„name - "order_Jcey"; 
rel->tab[0].level - 0; 
rel->tab[0j.top„element = NULL; 
rel->tab[oiselectDaPtr - NULL; 
rel->tab[0j.nurjx_col - 0; 
rel->tab[03.num„join - 1; 
rel->tab[0]->join_info[0]->col - "order_Jcev"; 
rel->tab(0]->join info[0]->nu m j oin => 1; 
rel->tabl0|->join_uifo{0|->foreignl0].col » "o.Jcey"; 
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rcI->tab(0>>joiiLjnfo(Ol->forcign[0].tab - ~part_tab"; 

rel->Od](0>>joui^o{0]->foreign(0].indexInSQLDA - DXX_>K)NE; 

rtl-MabtOj-xxmdiiion - NULL; 

rcl->tab{ 1 J.aame - "part_tob"; 

rel->Uib( 1 }primary_Jtcy_jiflmc - "partjcey"; 

rel->tab(4levcl - 0; 

rcl->tah(litop_clcment - NULL; 

rel->tab(liaelectnaPtr- NuLL; 

rcI->tab(llmim_col - 0; 

rei->tab( 1 J.rmnx_Join • 2; 

rcl->tab(l )->jouu_infb{0)->ool - "o^key"; 

rcl->tab( 1 ]->join_info{0}->mim_join - 1; 

rel->tab(l)->jotn_jnfo{0]->foreign(0].col - "order _Jcey"; 

rel->tab{l]->join info{0]->foreign[0].tab - "order_tab"; 

reL->tob(l>>joLn_iiifo{0>>forei^oi indexing - DXX_WNE; 
rcl->tah(l >>jo b_Jnfo(l]->col - "parOry"; 
reI->tab(l)->join_Jnfo{l}->nuirL_join - 1; 
rel->t^jy>join_info{l}->foreign(Oj.ool - "p_^key"; 
rel->t^l}'>join_Juifo(l]->£6reign[0].t&b - •*ship_tab"; 
re!->tarXl]->joLix^o(l]->fordg4oiirjdexIriSQLDA - DXX^NONE; 
rel->tab( 1 >xxmdilion - NULL; 
rei->tab(2lmune - "8hip_tab"; 
rel->tab(2). primary _key __name - **ship_date"; 
rel->tab(2}.top_element - NULL; 
rcl->tab(2j.level - 0; 
rel->tab(2).selectDbPtr - NULL; 
rel->tab(2].iiuin_ool - 0; 
rel->tab(2j.iiuin_joiii - 1; 
rel->tab|2)->]oin_Jnfo(0)->ool - "p Joey"; 
rel->tab[2]->joiaJnfo(0]->niim^join - 1; 
rel->tab(2>>join_infb[0)->forign(0]xol - "part _key"; 
rcl->tab(2V>join_Jnfo{0]->forign|0].tab - "part__tab w ; 
rcl->tob(2}->joinJnfo(0}.>foreigr40l.ir^ - DXX_JNONE; 

rcl->tab[2>>condition - NULL; 
rel->num_tabies - 3; 



[10U] G.3.2.1 Process Relational Information 

[1012] This is the second phase of the preparation process 
to gather all information to generate SQL statements. It 
recursively processes each RDB_node for each attribute, 
text, and element node of DAD, and records the mapping 
relationship into the REL data structure. 



proces*_rel(rel, dora_nodc,current_Jevcl, dxx_type) 
{ 

/* The first pass fills in everything in rel, 

except foreign_Jcey. */ 

process rell (rel, doro node,cufrent _Jevel, dxx__type); 

if (dxx_type — DXX_DECOMPOSE) { 

/* The second pass fills in the forcign^key of each table 

in rel and update num_col. */ 
process re!2(rel); 

} 

findTopElements(rel, dom node, dxx_type); 

Create a hash table or sorted array of all top_clcmcnts 
rel->top_elements for fast search. 

} 

process_rell(rcl, dom_node, dxx_typc) 

{ 

if (dom _ node type — ele merit _node) { 
rel->current_level-M-; 
child - DOMGetFirstChild; 
while (child !- NULL) { 

process _rell (rcl,child,dxx type); 

child - DOMGetNextchild; 

} 

} 

if (type — aUribute_node || 
type ™- text_nodc | 

(dom_oode is a leaf clement node && DXX_DECOMPOSE)) { 
call DOMgetChild to get the RDB_node of dom_Dode; 



-continued 



if ( there is no RDB _node ) 
return error; 

call process_RDB_node(rel/iom_noojc, RDB_node,doL_type); 

} 

if (dom__nodc type — element_rjode) 
rel->currcnt level — ; 

} 

/* Fill in the foreign_key of each table in rel. */ 

pTocess_rel2(rel) 

{ 

/* Working variables: */ 
DXX_TAB ♦forcignTab; 

for (tab Index - 0; tablndex <rcl->num_tables; tablndex ++) { 
current_tab - rei->tab [tablndex]; 
for (t - 0; i < current_tab->num_join; i++) { 

for (j-O; j< current^tab>join_Jnfc{i].num_Join; j++) { 
foreignTab - findTabByName(rel, 

current_tab->joiA_irifo[i].foreigi4jJ.^ 
if (foreignTab->level < current_tab~>lcvel) { 
strcpy(cwent_jab->Ebreign_3Dcyxol , 

cunent__tab->join^rifc{i].forcignfj].col); 
Find k such that foreignTab->col2xml[k].col 

equals current_tab->foreigrx_key.col; 
current„tab->foreign_j£eyjml - 
foreignTab->cotIniilfk]xnil; 

current_tab->foreign_lrjeyjmlType - 

foreignTar>->col2xir4k].inirrype; 
current_tab->foreign_Jcey.dataType = 
forcignTab->col2xml(k].dataType; 

} 

} 

} 

} 

} 
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[1013] G.3.2.2 Process RDB_node 

[1014] The following routine will process a RDB_node. 



proces8_RDB„Qodc(rcl f dom_Qodc, rdb_node, drx_type) 
{ 

working variables: 

DXX_TAB *mytab; 
tut i; 

/• It is attribute _nodc, teH_node, or a leaf element_node in DAD. "/ 
/• Note: table must be specified in RDB_nodc */ 

mytab - rci->tab[?] such that mytab. name - table of RDB„node 

if do mytab found 
return error, 

i - mytab->niun_col; 
mytab- >col2xml[i].col-coluiim of RDB_node; 
mytab- >col2xmI[i]xinlLevel - current_level; 
if ( dxx_type — DXX_DEOOMPOSE ) { 

mytab- >col2iml{i].dataType- type of RDB_node; 

} 

if (it is an attribute _node) { 

mytab- >col2xml{i]janl-name of the attribute 
if ((brx-type — DXX_DECOMPOSE) 

mytab- xx)12xinl[t}jLmJType - DXX_ ATTRI BUTE; 

} 

else if (it is a text__node) { 

mytab- >col2xml(iJjtinl -name of the parent clement__node; 
if (dxx_type — DXX_DECOMPOSE) 

mytab- >co l2xmJliJjunlType - DXX_TEXT, 
} else { /* It is a leaf element _jiooe in DAD. */ 

mytab- xx)l2xird[i]_xml -name of element node; 

if (dxx_type — DXX_DECOMPOSE) 
mytab- xx) 12xmJ[iJjtmlTypc - DXX_ELEMENT, 

} 

mytab->num_col++; 
if( condition !- NULL ) 

add condition to mytab- >condition; 



[1015] Definition of a qualifying parent: 

[1016] The parent p of an element qualifies for a table t if 
all of the following four conditions are met; 

[1017] p exists. 

[1018] p is intermediate. 

[1019] p has muIti_occurrence--YES 

[1020] p does not have any child mapped to a table 
other than L 

[1021] Intuitively, if a parent qualifies for a table, it is a 
candidate to be chosen as the top-element of the table. 



[1022] Definition of top-element: 

[1023] Let e be the highest element in DAD that belongs 
to a table t. 

[1024] If e does not have a parent qualifying for t, then e 
is the top-element of t. 

[1025] Otherwise, the top-element of t is the highest 
element in the parent chain of e that does not have a parent 
qualifying for t. 



findTopElemeot(rel^iom_node, rdb_node, dxx_type) 

{ 

working variables: 

DXX_TAB 'mytab; 
int i; 

/* It is attributc_node, text _o ode, or a leaf clement _node in DAD, */ 
/• Note: table must be specified in RDB_node •/ 

mytab - rcl->tab{?] such that mytab-name - table of RDB_node 
if no mytab found 

return error, 
if ( mytab- >top_element — NULL ) { 

parent node ™ parent element node of this dom node; 

grandpareat_node = parent of the parcnt__nodc; 

while ( grandparent node exists && 

grandparent_aode is intermediate && 



US 2002/0123993 Al Sep. 5, 2002 

65 



-continued 



grandparent _pode has multiple occurrence && 
grandparent_node does not have children mapped to 
other tables) { 

parent_node - grandparent„_node; 

grand pare nt„n ode - parent of the grandpare nt_nodc ; 

} 

mytab->top_elemcnt * name of the parent_jKxlc; 

} 

} 

/• findTopEkments finds the top-element for each table in reL */ 

/* Because of a problem raised by Dorine Yelton, we can no longer find 

the top element during process _KDB_node in processed 1. V 
/• findTopEkments assumes that process _jell has been called. */ 
findTop Elements (rcl, dom_node, dxx_type) 

{ 

if (dom_node type — element_node) { 
rel->currenLJevel++; 
child - DOMGctFirstChild; 
while (child 1-NULL) { 

findTopElement8(rcl,cnild4xx„type); 

child - DOMGetNextchild; 

} 

} 

if (type — attribute__node (| 
type — texLJiode | 

(dom_node is a leaf element node && DXX_DECOMPOSE)) { 
call DOMgetduld to get the RDB_nodc of dom_node; 
if ( there is no RDB_nodc ) 
return error; 

call findTopElcmeat^rel/Jonx^node, RDB__node,dxx_type); 

} 

if (dom_node type — clcment_node) 
rel->current_level — ; 

} 

Example: 

rcl->tab[0}.name - "order tab"; 

rel->tab[0).primary_key_name - "order_Jcey"; 
rel->tab[0j.forcigix_kry.col - ""; 
rcl->tah(0].top_clcment - "Order"; 
rcl->tab(0].selcctDaPtr - NULL; 

rel->tab(0).num col - 3; 

rel->tab(0j->col2xmI[G).xml- "Key*; 

rel->tab[0j->col2ainl(0].col-"order_jtey"; 

rcl->tab(0}->CTl2xiiil[oidataTyrje-"integer*'; 

rcl->tab{0>>col2xir4oirmriVpe-D^ 

rel->tab{0>>coL2nnl[0}zailLevel-0; 

rel->tarj(0)->col2xnu(l ixml- "name"; 

rel->tab(0>>col2tanlf 1 j.col-"customer„name"; 

rel->tab[0>>col2xa^lldatoT>pe--char(12^"; 

rcl->tab[0]->col2xral[l IxmJType-DXX^TEXT, 

rcl->tab(0>>col2xinl(l ixmlLevel-2; 

rel->tabl0>>col2xml[2i.xml- "email"; 

rel->tab{0>>cot2x3n^2|col« M aistomer_eniail"; 

rcl->tab[0l->col2xinl(2idataT>pe--cbar(12^"; 

rcl->tob[0>>«)12amU2lniinVpc-DXX_TEXT, 

rel->tab(0]->03l2xinl[2].xmlLevel-2; 

rcl->tab(0).nunx__join - 1; 

rel->tab(0j->join_infc(0>>col - "key"; 

rcl->tab(0}->joiA_Jiifc(0]->minvgoin - 1; 

rel->tab(0}->joiii_info(0>>foreign(0].col - "o_kcy"; 

rel->tab(0>>join_info[0>>foreigtt[0].tab - "part_tah"; 

rei->tab(0>>jom_info(0>>foreign[0].indexInSOLDA - DXX_NONE; 

rcl->tab[0]->condition - NULL; 

rel->tab(l J.namc - *part_tab"; 

rel->tab{ 1 ).primary_key_Mune - "part_Jcey"; 

rel->tab(lJ.foreign_Jcey.col - "order_Jcey"; 

rcl->tab{ 1 ). foreign _ key. dataType - "integer; 

rel->tab(l J.foreign_Jtey.xml - "Key"; 

rel- >tar^l J. foreign .keyxmlType - DXX .ATTRIBUTE; 

rel->tab(l).top_element - "Part"; 

rel->tab( 1 J.BelcctDaPtr - NULL; 

rcl->tab[l].num_col - 5; 

rcl->tab[l]->col2xml(0}xml - "Color"; 

rcl->ub[l]->col2xnil[0}col - "Color"; 

rel->tai](lJ->col2xinllOJ.dataTypc="char(5)"; 

rcl->tab| 1 |->col2xmll0 j.xmrrype~DXX^lTRIBUTE; 
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rel->tab(l]->coL2xmi[0).xinlLevel-l; 

rcl->tab(l)->coL2xml(Hjanl- "Key"; 

rcl->Ub(lJ->coL2xmiIl ].col-"part_xey"; 

rel->tab{ 1 )->coL2rmlJ 1 J. dataType-" integer"; 

rel->^l]->col2xmJ|UxinnVpe»DXX_TEXP, 

rcl->tab(l>>col2xniinixmllxvel-2; 

rcl->tob(l>>coL2xml(2].xml- "Quantity"; 

rel->tab{ 1 J-x»12nnl(2icol-"qt/'; 

^el->^Ab(l>>col2xml(2i.da^aTypc-'•inleger r, ; 

rel->tab(l]->col2iml(2].iinriy^DXX_n£X'r, 

rel->tafa{l>>col2zml{2^miLevcl-2; 

rcl->tab(l]->coL2xmi[3j.xml- "ExtendedPrice"; 

rel->tahf 1 >>col2xml(3).col-**price'*; 

rcl->tab( 1 >>col2xml[3].date'I>pe-**rcar; 

rel->tab(lJ->a>l2xml{3l.xmriVpc-DXX_TEXT, 

rel->tab( 1 >x»[2zml[3lxmlLevel-2; 

rcl->tab(l)->tt>l2nnll4lxml- "lax"; 

rel->tab( 1 i-x»I2Kmll4l.col--Iax"; 

rel->tab( 1 )->col2xmil4 J. data1>pc-"rcai"; 

rel->tab(l}->col2xml{4].xmnVpc-DXX_TE)Cr, 

rcl->tab( 1 }->col2xmI[4], xmILevcl-2; 

rel->tarXl].mim_Join - 2; 

rel->tab(l)->join_info(0)->col - M o_key"; 

rcl->tah(l>>join_Jnfo(0>>typc - "integer"; 

rel->tah(l]->join_info{0]~>Bum_Join - 1; 

ret->tab{l]->join_Jnfc(0}->foreign[0].col - "order_Jkcy"; 

ict->tab(l)->join_info(0]->forciga[0].tab - "Qrdex_tab"; 

rcl->tab(l]->join_jnfc(0]->fore^ - DXX_JSONE; 

rel->tab{ 1 J->joia_info( 1 >>oo! - "partake y"; 

rcl->tab(l)->join^info[l)->type » "integer"; 

rel->tab(l)->join info(l}->num join - 1; 

icU>tab(l]->join_iDfo[l]->foreign{0]xol - "p_key"; 

rel->tab[l]->join_Jnfo[l}->foretgD[0].tab - "ship_tab"; 

re!->teb(l]->joiji_Jnfo(l>>foreign(0].indexlnSQLDA - DXX^NONE; 

rel->tah( 1 ]->corjdition - "price > 2500. OCT; 

rel->tah[2}.name - "ship tab"; 

rel- >iah(2]. prima ry_kry_narne - "ship_date"; 
rcl->tab(2).foreign_key.col - "part_key"; 

rcl->tab[2]. foreign key.dataType - "integer"; 

rcl->tab[2].forcign_JceyjunJ - "Key"; 

rel->tab[2].foreign_^y^rflJType - DXX_TEXT, 

rcl->tab[2].top_element - "Shipment"; 

rel->tab{2LselectDaPtr - NULL; 

reI->tab{2].mim_col - 2; 

rcl->tab[2]->col2zmlf01xinl- "ShipDate"; 

rel->tab(2>>a)l2xmll0jcol-"date"; 

rcl->tab{2]-><»12xir40idata'I>pc--date"; 

rcl->tob(2>>col2xmU0].xmn>^e-DXX_TEXT, 

rcl->tab[2>>col2xmll0jimiLevel-3; 

rcl->tab[2J->col2xinilliimi- "ShipMode"; 

rel->tab[2>>col2xmi(l^col-"mode"; 

rcl->tabl2>>col2xmUl J.dalaTypc--char(128)"; 

rel->tabC2]->col2imlflixmlType-DXX_TEXT, 

rc)->tabf2j->col2iml[l |xmlLevel-3; 

rcl->tab(2].mim_join « 1; 

rel->tab[2>>join_info[0>>col - "p_Jcey"; 

rcl->tab[2]->join_Jnfo(0}*>type - "integer"; 

re!->tab(2)->join_info[0]->nurn_4oin - 1; 

rel->tab(2}->join_infa(0>>forign[0].col - "parOcey"; 

rcl->torj(2]->joiri_Jnfo(0>>forign(0].tab - "part_tab"; 

rel->tab[2]->join__info(0>>foreigntOj.uidexlnSQLDA - DXX^NONE; 

rel->tab[2]->condition - "date > 4 1966-01-01 m ; 

rel->num_tables - 3; 



[1026] G.3.3 Prepare Tables 

-continued 

[1027] The following routine checks whether the tables in — 
the rel structure exist and creates new tables if they do not working variables: 

DXX__TAB *curreQt_tab; 
for ( i-0; i < rcl->num___ tables; i++ ) { 



current_tab •» rcl->table[ij; 



prepare _tables(DXX_REL *rel, lablespace) check whether the table current_tab->namc exist; 

{ if (table exists in database) 



US 2002/0123993 Al 



67 



Sep. 5, 2002 



-continued 



chedL_table(curreot_t&b, tablespace); 
else 

CTeate_tablc(cunent_tab); 

} 



[1028] G.3.3.1 Check Table 

[1029] It is important to ensure that the existing table in 
the database complies with the DXX_tab data structure in 
memory. 



[1030] G3.3.2 Create New Table for Decomposition 

[1031] The following routine creates a new table for 
decomposition. 



create_table(DXX_TAB *<bx_Jab, tablespace) { 

working variables: 

char sql stmtf]; 

char col_stmt[J 

char col_type{l 
spriiirf(ftql jtmL/CREATE table %& ( 4 */btx_tab->namc); 
/• create all columns from XML V 



checfc_table(DXX_TAB tab) { 
/* Local variables: */ 
int i; 

char mytypename{18]; 

long mylength; 

/• Local host variables: 7 

char tabname{8£ 

char colnamc{8£ 

char typcaamc(18j 

long length; 

strcpy(tabnamc, tab->name); 

/• Check col2xml. */ 

for (M); t<tab->num_col; i++) { 

strcpy(colname, tab->col2xml[i}.col); 
/* Try to get column type and length from syscaL 7 
EXEC SQL SELECT 
typename into rtypenamc, 
length into :Iength 
FROM syscaLcolumns 
WHERE tabuame — :tabname 
AND colname - xolnamc; 
if (SQLCODE — 100) { 
/* Error: Table <mv>tabname</mv> does not have column <mv>coLname</mv>. */ 

} 

Break down tab->col2xml[i].dataType into mytypename and mylength; 
if (mytypename is different from typename) { 
/* Error: Column <mv>colname</mv> of <m v> tab name </mv> 
should have type <mv>typename</mv>. 7 

} 

if ((length > 0) && 

(mylength > length)) { 
/* Error: Column <mv>colname</mv> of <mv> tab name </mv> 
cannot be longer than <mv>lenguWmv>. */ 

} 

} 

/• Check join_info. V 

for (i-O; i<tab~>num_join; i++) { 
strcpy(colnamc, tab->join_info(i).col); 
/* Try to get column type and length from syscaL 7 
EXEC SQL SELECT 

typename into : type name, 

length into :length 

FROM syscaLcolumns 

WHERE tabname - itabname 

AND colname * :colname; 
if (SQLCODE — 100) { 

/* Error: Table <mv>tabname</mv> does not have column <mv>colnamc</mv>. 7 

} 

} 

} 



US 2002/0123993 Al 



68 



Sep. 5, 2002 



•continued 



for ( i - 0; i < dxx_tab->nuiEL_col; i++ ){ 
init coL-Stmt; 

if ( dxx u tab->col2xml[i].col - dxx_tar>->primary_Jcey_jiame ) { 
/* key V 

sprintf(coL_£tniV*%8 %& not mill primary key", 
dxx_tab-xx>i2xml[i].col, dba_tab->col2xml[i].dataTypc); 

} 

else { 

sprintf(coLjrmt **%s %s 

dxx_tab->col2xml[i].col, dx^„tab->col2janl[i].datoTypc); 

} 

strat(sqLstmivcol_5tint); 

} 

if (dxx_tab->foreign_key.col[0]) { 
/• Add foreign key to sql_stmt. */ 
sprintfXooLstmt, **%s %s", 
dxx_tab->jow_Jnfc{0j.col t 
dxx_t&b->foreign_Jcey.dataType); 
strcat(sql_stmt, ooI__^tml); 
f or ( i - 0; i < dxx_tab->num_join; i++ ){ 
init col stmt; 

if (dxx_tab->join^jnfo[i)ibreign[0].col — 
dxx_tab->foreign_key.col) { 
sprintf(ool_8tmt," REFERENCES %B{%e))" > 

dxx_tab->joia_info( i]ioreign((}].tab , 
dxx_tab- >join__info( i}ioreign[0].col); 

} 

} 

strcat(8qLjtmt,coL_stmt); 

} 

Execute sql_stmt; 

} 



[1032] Examples: 

[1033] create table order_tab (order_key integer not 
null primary key, 

[1034] customer_name cbar(32), 

[1035] customer_email char(100)); 

[1036] create table part_tab (part_key integer not null 
primary key, 

[1037] price real, 

[1038] qty integer, 

[1039] tax real, 

[1040] o_key integer REFERENCES orderj- 
ab(order_key)); 



[1041] create table ship_tab (ship_date date not null 
primary key, 

[1042] ship mode cbar(64), 
[1043] comments cbai(2k), 
[1044] p_key integer REFERENCES partjab- 
(part_key)); 

[1045] G3.33 Insert XML Document Into an Enabled 
Collection 

[1046] The following insert will insert an XML document 
into an enabled collection. 
[1047] dxxlnsert(collection_name^cmlobj^qlstate, 
msgtext) 

[1048] collection_name — the name of enabled 

XML collection 
[1049] xmlobj the input XML document 

[1050] sqlstate OUT: the SQL state in case 

of errors 

[1051] msgtext OUT: message text in case 

of errors 

[1052] Note that in the following routine, the sqlstate and 
msgtext are ignored in this embodiment. 



dxxlnse recollection name, xmlobj, sqlstate, msgtext) { 

working variable: 

DOMElement dad_root w _elcmcnt; 

DOMNodc xml_rooL_clcmcnt; 

DOMNodelist rootXMLNodelist; 

DXX__REL relation_Jnfo; 

DXX_JIEL *rel - &relation_Jnfo; 
/* retrieve DAD from XML_usage table */ 
select DAD from xml_usage where col_name - collection 
/* get DAD and parse */ 
parse DAD; 

set dad_joot_element to be the root element_node of DAD; 
/* set relational structure */ 

setRel(rel^ad4)XX_DECOMPGSE); 
/* process relational infonnation */ 

procc8s^d(rcl^_root_clement^)4)XX_X)EOOMPOSE); 
/* parse the xmlobj to get the DOM tree of XMLobj V 
parse xmlobj to get the xmldom of XML obj; 
xmL_joot_clement - the root element of xmldom; 
create a rootXMLNodeList, which has only one node: 

xml_root_element; 
/* decompose XML obj */ 

aecompose_element(rel,rootXMl^odeIjst^lad__root element); 



[1053] G33.4 Decompose Element 

[1054] The following routine decomposes an element. 



decompose_element(DXX_REL *rel, 

DOMNodelist *xml_clcmcnl _nodcs, 
DOMElement *dad_element_nodc) 

{ 

working variables: 

char sql_stmt(]; /• stmt for insert */ 
char pre_insrt_stmt(l /• stmt for insert */ 
char dad_element_nameQ 
DXX_TAB "current_tab; 
DXX_ROWS dxx_row, 
DXX^ROWS •dxx_rows = &dxx_iow; 
char rowf fc 
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HSTMT hstmt; 

1. call DOMgetAttribute to get clcmcnl name from dad_c Ic men t_nodc; 

2. find cuneat_tab - rcl->tablc(ij such that 

rcl->table(i}.top_element - cicment_oame; 
if oot found then 

currcat_tab - rel->table(i] such that 
currcnt_tab->col2xml[?Lxml - clemcnt_name 
if (current__tab not found) { 

/* Call decompose _clemeai on each children. */ 

for each child node of the dad_element node do 

if ( the child node is an element node ) { 
child_name - child_dad_element_node's name; 
chudXMLNodcs - NULL; 
call dxx__gctNodcList(xmI_clemcnt_node&, 

child._aame,childXMLNodes); 
call decompose ^clcmen^rcl, childXMLnodes, 
child„dad_elemenL_node); 

} 

}else{ 

3. if (currcm_tab->sql_Btml(0] — *\0') { 

/* Generate the SQL insertion stmt with parameter markers. */ 
call generate_S<)Li(current_tab^)re_insert_8tmt); 
else 

strcpy(pre_insert L _8tmt, current_tab->sqL_stmt); 

4. /• Fill the rows with the data from xml_elcment__Dodes */ 

call gea_rows(rel, currents tab ,junl_c 1c mc nt nodes ,dad_c lement__node , 

dxx^_rows); 

5. /* insert the rows •/ 

CLI _AllocHandlc(SQL__HANDLE_STMT, rel->hdbc, & hstmt); 
CLI_Prepare(hstmt, saL_stmt); 
for ( i - 0; i < dxx__rows->num__rows; i++ ) { 
init sql_stmt; 

d^xrow„to_stmt(pTe_Jnsert_stmt, 

<brx__rows->row(iL 

cunent_tab->col2xmI , 

hstmt); 
CU_Execute(hstmt); 
Free dtEL_rows->row{il 

} 

Free dbot_rows->row; 
CUUFreeHandle(hstmt); 

6. for each child node of the dad_clemcnt_node do { 

if (( the child node is an cletnent_node ) AND 

(current_tab->top_element !- cbild_name) AND 

( ourreni_tab->a)12xrnl(?Jxml !- child __name)) { 
child_name - child_dad_element_node's name; 
childXMLNodes - NULL; 
call dxx__£etNocteList{xml_element__nod^ 

call decompose___clcmcnt(rcl, chi]dXMLDodcs,cbild_dad„el cmc n L__nodc) ; 
else /* covered by this one */ 
; /• do nothing */ 

} 

} /* end if (current _tab not found) */ 



[1055] G.3.3.4.1 Get Node List 

[1056] The following routine returns a list of nodes with a 
specified element_name from a right parent chain. 



dnL^tNc<leList(DOM__NodeList Dnl_element__node3, 

char * child name, 

DOM_NodeList •returnNodeList) 

{ 

/• Local variables: "/ 
mt n; 
int i 

n - Mil_element_nodes.gctLcngtbOi 
for (i-0; i<n; i++) { 

iml_element - xml_clcmcol_nodes.ilcm(i); 

children - xml_elemenLgetChildrenO; 



-continued 



for each xml_child in the children do { 

if ( Kml_child_name — child name) { 

add xml_child to retumNodc List; 

found - TRUE; 

} 

} 

if(oot found) { 

dn _gctNodeList(children^lcmciit _name^cUimNodeList); 

} 

}/* end of for i*/ 
return myn ode List; 

} 
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[1057] G.3.3.4.2 Generate SQL 

[1058] The following routine generates an SQL insertion 
statement with parameter markers. 



generate _SQLi(DXX_TAB *current_tab, /* input 7 
char •sql_stmt) /* output V 

{ 

wo dung variables: 

cnaiQ insert_8tml; 
/* Generate the first part of the statement, 

such as: INSERT INTO tl (n^,c)V 
sprintf(tnsert_stmt, "insert into %a (",current_tab->name); 
for (W); i<current_tab— >num__col; i++) { 
8printf(infiert _^stml, **%8**^currtiit_.tab->col2Knil(i).col); 

} 

/• Identify columns as the foreign key to parent table 7 
for (M); i<current_tab->for _key.num_col; t++) { 

sticat(inscrt_stmt, current_tab->join_infof i].col); 

strcat(inscrt_stmt, ",**); 

} 

Remove the last comma from insert_jttmt 
strcat(inseit_stmt, ")"); 
strcpy(sqL_stmt, insert_stmt); 

/* Generate the rest of the statement with parm__markers. 
For example, a complete statement may look like: 
INSERT INTO tl (w) SELECT * FROM 

TABLE (VAUJ ES (CAST(? AS INTEG ER) ,CAST(? AS DOUBLE), 

CAST(? AS CHAR<5)))) AS t(n,x,c) 
WHERE x»2 

•/ 

strcpy(cotNamea, sbxnr(insert_stmt, '(*)); 

parnx_markers[0] - '\0*; 

for (i-O; i<cnrrent_tab->num__.col; i-»-+) { 

strcat(pann_markers ( **CAST(? AS *"); 
sticat(parm_jnarker8, current__Uri>->col2xml[i]-dataTypc); 
strcat(parm_jnaxkers, "), "); 
} 

for (W); i<currcnl tab->for _Jcey.num_col; i++) { 

sticat(parm_markers, ~CAST(? AS 
strcat(parm_jnarkers > 

current_tab->for_key.coI[i].dataType); 

strcat(parm__markcrs, **); 

} 

Remove the last comma from parm_markcrs; 
sprintf(sql_stmt, 

~%e SELECT * FROM TABLE(VAHJES" 

-(%s)) AS tffca WHERE %e", 

insert_8tmt t pann_ markers, colNames, 

currenL_tab->condition); 

} 

Example: 

1. INSERT INTO order_tab (order _key, customer_name ) customer__email) 

SELECT * FROM TABLE(VALUES (CAST(7 AS INTEGER), CASTf? AS 
CHAR(128)), 

CAST(? AS CHAR(128))» AS 
t(order_key, customer _namc, customer_email) 

2. INSERT INTO part_tab (color, part__key, o_Jtey) 

SELECT * FROM TABLE(VALUES (CAST(? AS CHAR(10)),CAST(? AS INTEGER), 

CAST(? AS INTEGER))) AS 
t(color, part_key, o_key) 
WHERE price > 2500.00 

3. INSERT INTO ship„tab (ship_date, ship __mode, p_Jcey) 

SELECT * FROM TABLE(VALUES (CAST(? AS DATE), CAST(? AS CHAR (128)), 

CAST(? AS INTEGER))) AS 
t(ship_datc, ship_mode, p_key) 
WHERE date > 4 1966-01-01' 
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[1059] G.3.3.4.3 Generate Row Data Structure 

[1060] The following routine generates the row data struc- 
tures from the data of xml elements. 



gciL_rows(DXX_REL 'rel, 

DX?C_TAB •Current_tab, 
DOMNodeUst xml_element_nodes, 
DOMElcmeot dad_element_node, 
DXX_ROWS •dxxrows) 

{ 

working valublc: 

DXX_ROW •row, 

get the clement _mme by call DOMGetAttribute of dad clement node 

/* initial ite the internal data structure V 

dxxrows— >QiirtL_rows — 0; 

for(i - 0; i<MAX_NUM_ROWS; i++) 

dxxrow8->row(i] - NULL; 

/• loop to work on each instance of the xml eiement_oode •/ 

for (i - O, i < xml„element_nodes.lenglh; i++) { 

xml_elemeni - (element) xml_elemeot_jiodes.iteDi(f); 

call get_n>ws(rel, current_tab,xml elemcnt T dad_elemcnt_node > dxxrowB); 

} 



[1061] G.3.3A4 Get Row Data from an XML Element 

[1062] The following routine gets the row data from an 
XML element. 



get_jows(DXX_REL *rel, 

DXX_TAB * cunent_tab, 
DOMHcmcDl xxru_element > 

DOMElcmcQt dad__elemeni node, 

DXX_ROWDATA 'dxxrows) 

{ 

call DOMgetChildNodes to get a list of all children of 

dad clcmcnt__nodc 

for each chiId_Dodc, do { 
if fisChildAttiNode Q 
isChildTextNode || 
isChildElementNode) { 
set isChildlnCurrentTab to TRUE, if the child_node name equals 
currenL_tab->top__element or in its co)2xmlQ 
set isChfldTopOfAnotherTab to TRUE if the child_node name equals 
to another_tab->top_element where another_tab !- current_tab. 
Use rel->top_elements for fast search, 
if CisChudlnCurrentTab [| 
isChildlntcrmediate || 
isChildTextNode) { 
set isChildAttrNode to TRUE if the child_node is attribute_node; 
if (isChildAttrNode) { 

get attribute _name from child _node 
get_coldata(rel, current_tab^xxiows4)XX^ATTRIBUTE, 
xml_e lemcnL^ttribute_name) ; 

} 

if (isChildTextNode) { 

get clement_name from child node 

get_coldata(reJ, current_tab, dxxrows t DXX_TEXT, 
xml_element, elemcnl__name); 

} 

if (isChildElementNode && 
lisChildTopOfAnotherThb) { 
get name of the child node; 

Construct a single element node list (xml_elcment_nodes) for xml_eicmcnt; 
node List - getNodelist(xml_elemeQt_nodes; child_element_name); 
for each node in Nodlist do { 
call get_rows(rel, curTent_tab,child __xml_e lenient, 

dad_child_element_node 

dxxrows); 
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} 

} 

}clse{ 

/* Child is not in current tab. 
Do do thing. */ 

} 

} 

} 

if (there is no significant child && 

dad_element_jiode is in curcent_tab) { 
/* This dad__element_jiode is a leaf-node and should be inserted 

without decomposition. "/ 
geL_co1data(rcl, current_tab, dxxrows, DXX_ELEMENT, 
xml_element, element__namc); 

} 

} 



[1063] G.3.3A5 Get DaU from Column 

[1064] The following routine gets data from a column. 



/* Get data from its column. */ 
get_coldata(DXX_REL *rcl, 

DXX_TAB *current_tab, 

DXX_ROWS •dmows, 

int rmlType, 

DOMBement xml__clement, 
char •rmLnamc) 

{ 

/• working variables: "/ 

int collndcr; 

int coIType; 

int xmlLevel; 

DXX_ROW 'row, 

/* find column index */ 

for (i - 0; i < current_tab->num_col; i++) { 

if(currcnt„bib->col2xmI[i}jnil - xml_name) { 
oollndex - i; 

coIType - current_Jab->col2xml[i].dataTypc; 
xmlLevel - currcnt_tab->col2xmlJi]jmilLevel; 
break; 



-continued 



} 
} 

if (xmlType — DXX_^ATTRIBUTE) { 

data - xml_clemcriLDOMGetAttribute(xml_name); 
} else if(xmlTypc — DXX_TEXT) { 

xmL_text__nodelist - the text child nodes of xml_elcment; 

Concatenate all the text from xml_text_jnodelist and 
set it to xml_text_node; 

data - xml_tcxt_node.getNode Value0; 
}else{ 

/* xmlType — DXX_ELEMENT •/ 
data - serialize xml_element as XML; 

} 

length - strlen(data); 

if (coIType is char, varchar, date, time, or times tamp) { 
/* Enclose the data with single-quotes. */ 
check length with coIType; 
length+2; 



if (d^_jows->num_rows — 0) { 

allocate dxx_rows- >ro w(0 1 

dxx_jrows->num_row8++; 

row - dxx_jows->row[0j 

row->num_col - current_tab->num_col 

allocate row->coldata 
as an array of size ro w- >num_col; 

mcmsct row->coldata{.„] to NULL; 

get„fbrcign_Jcey_value(rel, 
xml_element, 
currcnt__tab->foreign_key, 
^row->coldata[rcrw->num_col-lj)); 

} 

else { 

previous _jow - dxx_rows [dtx_rows->num_rows-lJ 
if (previous _row->coldata[collndex] — NULL) { 
/* we know that it is not filled V 
row - previous _row, 

} 

else { 

/* previous row's coldata is filled "/ 

allocate dxx_rows[dxx_rows- >num_rows}; 

row » An mm[rfiT mux— >rnim irwraj; 

row->num_ool - current__tab->num_col 

allocate row->coldata 

as an array of size row->num_col 

copy_row(row,prevk>us _jowycolladexvcurrent__tab); 

get __forcign_Jocy_valuc(rel, 
xml_element, 
currenL_tab->forcign_key, 
&(row- >coldata[ row- > num__col- 1 D); 

dxx__rows->num_jiowB++; 



} 



} 



row->coldata[col lndex}-malloc(lengtiH-l); 

copy data to row->coldata[il add single-quotes if needed; 



[1065] G3.3A6 Get Foreign Key Values 



/* Get foreign key value by climbing the DOM tree. */ 
get _forcign__key_valuc(DXX__REL "rel, 

DOM Element xml_element, 

DXX_COL2XML *forcign_kcy, 
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char •*foreign_jcey_value) /• OUT •/ 

{ 

/* working variables: 7 

DOM_Element parent; 
char *data; 
DOM_NodeList al; 
DOM_Element e; 

parent - xml_demenLgetPaientNodeQ; 
while (parent is in the same table as xml_elcmenl || 
parent is an intermediate node) { 
parent - parent-getParentNodeO; 

} 

switch(foreign_Jtey->xmlTypc) { 
case DXX_ATTR[BUTE: 

data - paienLgetAttribute(foreign _Jcey->xml); 

break; 
case DXX_TEXT. 

nl - parettLgetElementsByTagName(fortign_key->xml); 

e * al.item(O); /* the first element whose name is foicign_Jcey~>xml */ 

data - c.getFirstChildO.getNodeVWucO; 

break; 
dcEauJt: 

/* error unknown foreign_lcey_ymlType 7 
} /• end of swfoch(foreigii_Jcey_*niJTvpe) 7 
Allocate •fareign_Jcey_value; 
Copy data into * forcign_Jtey_value and 

enclose the data with single-quotes if needed 

according to foieign_Jcey->dataType; 

} 

/* Copy the columns of a row down to the current level. 7 
copy_row(DXX_ROW * row, 

DXX_ROW *prcvious_row, 

int collndcx, 

DXX_TAB * current_tab) { 
num_coU - prcvioua_row->num_cols + 

(cunent_tab->foreign_key.coI(OD? 1: 0; 
for(i - 0; i < num_cols; i ++) { 

previous Level - current^ tab- >col2xml[i)jml Level; 
if (prcvioua_row->coldata[i] !- NU1J, && i I^doI Index 
&& prcviousLevel <- currcnC_tab->leveI) { 
row->coldata[i] - str&ip(previous_jow->coldata(iI); 
}else{ 
row->coldata(i] - NULL; 

} 

} 



[1066] G.3.3A7 Insert Bind 

[1067] The following routine binds the parameters of an 
INSERT statement with the data in a row structure using 
CLI. 



dxxrow_to_stmt(char "prefix, 
DXX_ROW 'row, 
COL2XML •col2xml, 
SQLHSTMT hstmt) 

{ 

/* Ijocal variables: 7 

CLI_type, CLI_size, CLI„digits, bufLcn; 
int i; 

for (M); i<row->num_col; i++) { 

Get the CLI parameters CU_type, CLLsizc, and Cli_digits. 
CLLJBbdParameter(hstmt > i+1, SQL_PARAM_INPUT, SQL_C_C11AR, 

CLI„type, CLI_size CU_digits, 

row->coldata[il buflxn, NIJ1 .1 .); 

} 

} 
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[1068] G.3.3.5 Shred XML Document into DB2 Data- 
bases 

[1069] The following routine shreds an XML document 
into a DB2 database. The stored procedure dxxShredXML( 
) works the same as dxx!nsertXML( ) except that it takes a 
DAD as the first input parameter instead of a name of an 
enabled XML collection. Therefore, it can be called without 
enabling an XML collection. 

[1070] The stored procedure dxxSbredXML( ) inserts an 
input XML document into an enabled XML collection 
according to the Xcollection specification in the input DAD. 
If the tables used in the Xcollection of the DAD do not exist 
or the columns do not meet the data types specified in the 
DAD mapping, an error will be returned. The stored proce- 
dure dxxShredXML( ) decomposes the input XML docu- 
ment and inserts fragmented XML data into the tables 
specified in the DAD. 

[1071] Stored Procedure Declaration: 

[1072] dxxShredXML(char(128) DAD_buf, /* input 
V 

[1073] CLOB xmlobj, f* input */ 

[1074] long *errC6de, /* output */ 

[1075] varcbar(1024) *errMsg) /* output */ 

[1076] Parameters: 

[1077] DAD_buf: a buffer containing the DAD, 

[1078] xmlobj: IN, an XML document object in 
XMLCLOB type. 

[1079] errCode: OUT, return code in case of error, 

[1080] errMsg:OUT, message text in case of error. 

[1081] The following is an example of the dxxShredXML( 
)call. 



EXEC SQL INCLUDE SQLCA; 

EXEC SQL BEGIN DECLARE SECTION; 

char DAD_bu5l28t /* name of an XML collection */ 

SQL TYPE is CLOB_FILE xmlDoc; /* input XML document V 

long enCode; /* error code */ 

char errMsgf 1024^ /* error message text */ 

Short (kui ind] 

short xmlDoc_Jnd 
short crrCode__ind 
short errMsg_ind; 
EXEC SQL END DECLARE SECTION; 

/* suppose the ptrBuf has DAD content */ 

/* initialize host variable and indicators */ 

strcpy(DAD_buf,ptrBuf); 

strcpy(xmlobj. name,"c :\xml\order 1 .nnT*) ; 

xmlobj .name_length^tr len(* 4 e:\xmI\order 1 junl"); 

xmlobj.fiJe_optioo-SQL_FILE _REAJ>, 

enCode -0; 

errMsgfO)- 4 \0'; 

dad_ind - 0 

xmlobj _ind * 0; 

enCodc_ind - -1; 

enMsg ind - -1; 

/• Call the stored procedure 7 

EXEC SQL CALL db2sml!dxxShrcdXML(:DAD_buf:dad_ixid; 
junlobj :xmlobj\_ind, 

:crrCod£:errCodc__ind,:erTMsg:crrMsg ind); 



[1082] If the content of DADbuf has the 
Litem_DAD3.dad content, then the dxxShredXML( ) call 
will decompose the input XML document 
"e:\xmI\orderl. xml" and insert data into the sales_prder 
collection tables. 

Conclusion 

[1083] This concludes the description of an embodiment 
of the invention. The following describes some alternative 
embodiments for accomplishing the present invention. For 
example, any type of computer, such as a mainframe, 
minicomputer, or personal computer, or computer configu- 
ration, such as a timesharing mainframe, local area network, 
or standalone personal computer, could be used with the 
present invention. 

[1084] The foregoing description of an embodiment of the 
invention has been presented for the purposes of illustration 
and description. It is not intended to be exhaustive or to limit 
the invention to the precise form disclosed. Many modifi- 
cations and variations are possible in light of the above 
teaching. It is intended that the scope of the invention be 
limited not by this detailed description, but rather by the 
claims appended hereto. 

What is claimed is: 

1. A method of locating data in a data store connected to 
a computer, the method comprising the steps of: 

creating a main table having a column for storing a 
document, wherein the document has one or more 
elements or attributes; 

creating one or more side tables, wherein each side table 
stores one or more elements or attributes; and 

using the side tables to locate data in the main table. 

2. The method of claim 1, wherein the document in the 
column is an extensible markup language document. 

3. The method of claim 1, wherein one or more side tables 
are created after the column for storing the document is 
enabled. 

4. The method of claim 1, further comprising generating 
the side tables using a data access definition. 

5. The method of claim 4, further comprising providing a 
graphical user interface to enable a user to create the data 
access definition. 

6. The method of claim 1, further comprising converting 
the elements or attributes to SQL data types. 

7. The method of claim 1, further comprising generating 
one or more triggers to provide synchronization between the 
main table and side tables. 

8. The method of claim 7, wherein a trigger is activated 
upon data being inserted into the column for storing a 
documeat. 

9. The method of claim 7, wherein a trigger is activated 
upon data being modified in the column for storing a 
document. 

10. The method of claim 1, wherein data is located using 
a location path. 

11. The method of claim 1, further comprising creating an 
index on each side table. 

12. The method of claim 11, wherein locating data in the 
main table further comprises: 

searching for the data in the side tables using the indexes; 
and 
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mapping data located in the side tables to data in the main 
table. 

13. The method of claim 1, further comprising searching 
from a join view. 

14. The method of claim 1, further comprising receiving 
a query on one or more side tables and using the side tables 
to locate data in the queried side tables. 

15. The method of claim 1, further comprising using an 
extracting user-defined function to locate data. 

16. The method of claim 1, further comprising searching 
on an element or attribute with multiple occurrences. 

17. The method of claim 1, further comprising performing 
a text search on the document. 

18. The method of claim 1, further comprising performing 
a range search. 

19. The method of claim 1, further comprising enabling 
the column. 

20. The method of claim 1, further comprising disabling 
the column. 

21. An apparatus for locating data in a data store, com- 
prising: 

a computer having a data store coupled thereto, wherein 
the data store stores data; and 

one or more computer programs, performed by the com- 
puter, for creating a main table having a column for 
storing a document, wherein the document has one or 
more elements or attributes, for creating one or more 
side tables, wherein each side table stores one or more 
elements or attributes, and for using the side tables to 
locate data in the main table. 

22. The apparatus of claim 21, wherein the document in 
the column is an extensible markup language document. 

23. The apparatus of claim 21, wherein one or more side 
tables are created after the column for storing the document 
is enabled. 

24. The apparatus of claim 21, further comprising gener- 
ating the side tables using a data access definition. 

25. The apparatus of claim 24, further comprising pro- 
viding a graphical user interface to enable a user to create the 
data access definition. 

26. The apparatus of claim 21, further comprising con- 
verting the elements or attributes to SQL data types. 

27. The apparatus of claim 21, further comprising gener- 
ating one or more triggers to provide synchronization 
between toe main table and side tables. 

28. The apparatus of claim 27, wherein a trigger is 
activated upon data being inserted into the column for 
storing a document. 

29. The apparatus of claim 27, wherein a trigger is 
activated upon data being modified in the column for storing 
a document. 

30. The apparatus of claim 21, wherein data is located 
using a location path. 

31. The apparatus of claim 21, further comprising creating 
an index on each side table. 

32. The apparatus of claim 31, wherein locating data in 
the main table further comprises: 

searching for the data in the side tables using the indexes; 
and 

mapping data located in the side tables to data in the main 
table. 



33. The apparatus of claim 21, further comprising search- 
ing from a join view. 

34. The apparatus of claim 21, further comprising receiv- 
ing a query on one or more side tables and using the side 
tables to locate data in the queried side tables. 

35. The apparatus of claim 21, further comprising using 
an extracting user-defined function to locate data. 

36. The apparatus of claim 21, further comprising search- 
ing on an element or attribute with multiple occurrences. 

37. The apparatus of claim 21, further comprising per- 
forming a text search on the document. 

38. The apparatus of claim 21, further comprising per- 
forming a range search. 

39. The apparatus of claim 21, further comprising 
enabling the cohimn. 

40. The apparatus of claim 21, further comprising dis- 
abling the column. 

41. An article of manufacture comprising a program 
storage medium readable by a computer and embodying one 
or more instructions executable by the computer to perform 
method steps for locating data in a data store connected to 
the computer, the method comprising the steps of: 

creating a main table having a column for storing a 
document, wherein the document has one or more 
elements or attributes; 

creating one or more side tables, wherein each side table 
stores one or more elements or attributes; and 

using the side tables to locate data in the main table. 

42. The article of manufacture of claim 41, wherein the 
document in the column is an extensible markup language 
document. 

43. The article of manufacture of claim 41, wherein one 
or more side tables are created after the column for storing 
the document is enabled. 

44. The article of manufacture of claim 41, further com- 
prising generating the side tables using a data access defi- 
nition. 

45. The article of manufacture of claim 44, further com- 
prising providing a graphical user interface to enable a user 
to create a data access definition. 

46. The article of manufacture of claim 41, further com- 
prising converting the elements or attributes to SQL data 
types. 

47. The article of manufacture of claim 41, further com- 
prising generating one or more triggers to provide synchro- 
nization between the main table and side tables. 

48. The article of manufacture of claim 47, wherein a 
trigger is activated upon data being inserted into the column 
for storing a document. 

49. The article of manufacture of claim 47, wherein a 
trigger is activated upon data being modified in the column 
for storing a document 

50. The article of manufacture of claim 41, wherein data 
is located using a location path. 

51. The article of manufacture of claim 41, further com- 
prising creating an index on each side table. 

52. The article of manufacture of claim 51, wherein 
locating data in the main table further comprises: 

searching for the data in the side tables using the indexes; 
and 

mapping data located in the side tables to data in the main 
table. 
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53. The article of manufacture of claim 41, further com- 
prising searching from a join view. 

54. The article of manufacture of claim 41, further com- 
prising receiving a query on one or more side tables and 
using the side tables to locate data in the queried side tables. 

55. The article of manufacture of claim 41, further com- 
prising using an extracting user-defined function to locate 
data. 

56. The article of manufacture of claim 41, further com- 
prising searching on an element or attribute with multiple 
occurrences. 

57. The article of manufacture of claim 41, further com- 
prising performing a text search on the document. 

58. The article of manufacture of claim 41, further com- 
prising performing a range search. 

59. A method of transforming data stored on a data storage 
device that is connected to a computer, the method com- 
prising: 

receiving a query that selects data in the data storage 
device; 

retrieving the selected data into a work space; and 

generating one or more XML documents to consist of the 
selected data. 

60. The method of claim 59, wherein the work space 
comprises a table having one or more columns and wherein 
the one or more XML documents are generated by mapping 
each column to an element or attribute of one of the XML 
documents. 

61. The method of claim 59, wherein the one or more 
XML documents are generated using a data access defini- 
tion. 

62. The method of claim 61, further comprising using a 
document type definition to validate the one or more XML 
documents. 

63. The method of claim 61, further comprising using a 
document type definition to prepare the document access 
definition. 

64. The method of claim 61, wherein the document access 
definition further comprises an Extensible Markup Lan- 
guage Path data model based definition of the one or more 
XML documents to be generated. 

65. The method of claim 59, wherein the work space 
comprises a table and further comprising mapping column 
names of a table to equivalence classes. 

66. The method of claim 65, wherein the equivalence 
classes are defined by a user. 

67 The method of claim 65, wherein the equivalence 
classes are defined by a heuristic approach. 

68. The method of claim 59, further comprising removing 
duplicates from the selected data. 

69. The method of claim 59, wherein generating one or 
more XML documents comprises using an Xcollection defi- 
nition that defines how to compose the one or more XML 
documents from the retrieved selected data. 

70. The method of claim 68, wherein the Xcollection 
definition is contained in a data access definition. 

71. The method of claim 68, wherein the Xcollection 
definition comprises an SQL_query element. 

72. The method of claim 59, further comprising, prior to 
retrieving data, parsing a document access definition. 



73. The method of claim 59, wherein the one or more 
XML documents are generated using a query, XML com- 
position stored procedures, and a document access defini- 
tion. 

74. The method of claim 59, wherein the data to generate 
one or more XML documents is stored in an XML collec- 
tion. 

75. The method of claim 59, wherein the one or more 
XML documents are shared between businesses. 

76. The method of claim 59, wherein the one or more 
XML documents are generated by stored procedures. 

77. The method of claim 59, wherein the stored proce- 
dures can be called from database client code. 

78. An apparatus for transforming data, comprising: 

a computer having a data store coupled thereto, wherein 
the data store stores data; and 

one or more computer programs, performed by the com- 
puter, for receiving a query that selects data in the data 
storage device, retrieving the selected data into a work 
space, and generating one or more XML documents to 
consist of the selected data. 

79. The apparatus of claim 78, wherein the work space 
comprises a table having one or more columns and wherein 
the one or more XML documents are generated by mapping 
each column to an element or attribute of one of the XML 
documents. 

80. The apparatus of claim 78, wherein the one or more 
XML documents are generated using a data access defini- 
tion. 

81. The apparatus of claim 80, further comprising using a 
document type definition to validate the one or more XML 
documents. 

82. The apparatus of claim 80, further comprising using a 
document type definition to prepare the document access 
definition. 

83. The apparatus of claim 80, wherein the document 
access definition further comprises an Extensible Markup 
Language Path data model based definition of the one or 
more XML documents to be generated. 

84 The apparatus of claim 78, wherein the work space 
comprises a table and further comprising creating mapping 
column names of a table to equivalence classes. 

85. The apparatus of claim 84, wherein the equivalence 
classes are defined by a user. 

86. The apparatus of claim 84, wherein the equivalence 
classes are defined by a heuristic approach. 

87. The apparatus of claim 78, further comprising remov- 
ing duplicates from the selected data. 

88. The apparatus of claim 78, wherein generating one or 
more XML documents comprises using an Xcollection defi- 
nition that defines how to compose the one or more XML 
documents from the retrieved selected data. 

89. The apparatus of claim 88, wherein the Xcollection 
definition is contained in a data access definition. 

90. The apparatus of claim 88, wherein the Xcollection 
definition comprises an SQL_query element. 

91. The apparatus of claim 78, further comprising, prior to 
retrieving data, parsing a document access definition. 

92. The apparatus of claim 78, wherein the one or more 
XML documents are generated using a query, XML com- 
position stored procedures, and a document access defini- 
tion. 



US 2002/0123993 Al Sep. 5, 2002 

77 



93. The apparatus of claim 78, wherein the data to 
generate one or more XML documents is stored in ao XML 
collection. 

94. The apparatus of claim 78, wherein the one or more 
XML documents are shared between businesses. 

95. The apparatus of claim 78, wherein the one or more 
XML documents are generated by stored procedures. 

96. The apparatus of claim 78, wherein the stored proce- 
dures can be called from database client code. 

97. An article of manufacture comprising a program 
storage medium readable by a computer and embodying one 
or more instructions executable by the computer to perform 
method steps for transforming data in a data store connected 
to the computer, the method comprising the steps of: 

receiving a query that selects data in the data storage 
device; 

retrieving the selected data into a work space; and 

generating one or more XML documents to consist of the 
selected data. 

98. The article of manufacture of claim 97, wherein the 
work space comprises a table having one or more columns 
and wherein the one or more XML documents are generated 
by mapping each column to an element or attribute of one of 
the XML documents. 

99. The article of manufacture of claim 97, wherein the 
one or more XML documents are generated using a data 
access definition. 

100. The article of manufacture of claim 99, further 
comprising using a document type definition to validate the 
one or more XML documents. 

101. The article of manufacture of claim 99, further 
comprising using a document type definition to prepare the 
document access definition. 

102. The article of manufacture of claim 99, wherein the 
document access definition further comprises an Extensible 
Markup Language Path data model based definition of the 
one or more XML documents to be generated. 

103. The article of manufacture of claim 97, further 
comprising mapping column names of a table to equivalence 
classes. 

104. The article of manufacture of claim 103, wherein the 
equivalence classes are defined by a user. 

105. The article of manufacture of claim 103, wherein the 
equivalence classes are defined by a heuristic approach. 

106. The article of manufacture of claim 97, further 
comprising removing duplicates from the selected data. 

107. The article of manufacture of claim 97, wherein 
generating one or more XML documents comprises using an 
an Xcollection definition that defines how to compose the 
one or more XML documents from the retrieved selected 
data. 

108. The article of manufacture of claim 107, wherein the 
Xcollection definition is contained in a data access defini- 
tion. 

109. The article of manufacture of claim 107, wherein the 
Xcollection definition comprises an SQL_query clement. 

110. The article of manufacture of claim 97, further 
comprising, prior to retrieving data, parsing a document 
access definition. 

111. The article of manufacture of claim 97, wherein the 
one or more XML documents are generated using a query, 
XML composition stored procedures, and a document access 
definition. 



112. The article of manufacture of claim 97, wherein the 
data to generate one or more XML documents is stored in an 
XML collection. 

113. The article of manufacture of claim 97, wherein the 
one or more XML documents are shared between busi- 
nesses. 

114. The article of manufacture of claim 97, wherein the 
one or more XML documents are generated by stored 
procedures. 

115. The article of manufacture of claim 97, wherein the 
stored procedures can be called from database client code. 

116. A method of transforming data stored on a data 
storage device that is connected to a computer, the method 
comprising: 

generating a document object model tree using a docu- 
ment access definition; 

traversing the document object model tree to obtain 
information to retrieve relational data; and 

mapping the relational data to one or more XML docu- 
ments. 

117. The method of claim 116, wherein the document 
access definition defines a mapping between the relational 
data and one or more XML documents. 

118. The method of claim 116, wherein the document 
object model tree comprises one or more relational database 
nodes. 

119. The method of claim 118, wherein a relational 
database node comprises an attribute node that maps to a 
column of a relational database table. 

120. The method of claim 118, wherein a relational 
database node comprises an element node that maps to a 
column of a relational database table. 

121. The method of claim 118, wherein a relational 
database node comprises a text node that maps to a column 
of a relational database table. 

122. The method of claim 118, wherein the relational 
database node identifies a relational table into which XML 
document data is to be stored. 

123. The method of claim 11 8, where in the relational 
database node identifies a column in a relational table that 
contains XML document data to be retrieved. 

124. The method of claim 118, wherein the relational 
database node identifies one or more predicates used to 
select column data from a relational table to store into one 
or more XML documents. 

125. The method of claim 118, wherein the relational 
database node identifies a join relationship for joining mul- 
tiple tables. 

126. The method of claim 125, wherein the relational 
database node identifies a primary and foreign key relation- 
ship for the join relationship. 

127. The method of claim 116, further comprising gen- 
erating queries to obtain relational data using the document 
object model tree. 

128. The method of claim 116, wherein the relational data 
comprises an attribute value to be written to an XML 
document. 

129. The method of claim 116, wherein the relational data 
comprises clement text to be written to an XML document. 

130. The method of claim 116, further comprising a stored 
procedure that receives the document access definition and 
outputs a table populated with the one or more XML 
documents. 
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131. An apparatus for transforming data, comprising: 

a computer having a data store coupled thereto, wherein 
the data store stores data; and 

one or more computer programs, performed by the com- 
puter, for generating a document object model tree 
using a document access definition, traversing the 
document object model tree to obtain information to 
retrieve relational data, and mapping the relational data 
to one or more XML documents. 

132. The apparatus of claim 131, wherein the document 
access definition defines a mapping between the relational 
data and one or more XML documents. 

133. The apparatus of claim 131, wherein the document 
object model tree comprises one or more relational database 
nodes. 

134. The apparatus of claim 133, wherein a relational 
database node comprises an attribute node that maps to a 
column of a relational database table. 

135. The apparatus of claim 133, wherein a relational 
database node comprises an element node that maps to a 
column of a relational database table. 

136. The apparatus of claim 133, wherein a relational 
database node comprises a text node that maps to a column 
of a relational database table. 

137. The apparatus of claim 133, wherein the relational 
database node identifies a relational table into which XML 
document data is to be stored. 

138. The apparatus of claim 133, wherein the relational 
database node identifies a column in a relational table that 
contains XML document data to be retrieved. 

139. The apparatus of claim 133, wherein the relational 
database node identifies one or more predicates used to 
select column data from a relational table to store into one 
or more XML documents. 

140. The apparatus of claim 133, wherein the relational 
database node identifies a join relationship for joining mul- 
tiple tables. 

141. The apparatus of claim 140, wherein the relational 
database node identifies a primary and foreign key relation- 
ship for the join relationship. 

142. The apparatus of claim 131, further comprising 
generating queries to obtain relational data using the docu- 
ment object model tree. 

143. The apparatus of claim 131, wherein the relational 
data comprises an attribute value to be written to an XML 
document. 

144. The apparatus of claim 131, wherein the relational 
data comprises element text to be written to an XML 
document. 

145. The apparatus of claim 131, further comprising a 
stored procedure that receives the document access defini- 
tion and outputs a table populated with the one or more XML 
documents. 

146. An article of manufacture comprising a program 
storage medium readable by a computer and embodying one 
or more instructions executable by the computer to perform 
steps for transforming data in a data store connected to the 
computer, comprising: 

generating a document object model tree using a docu- 
ment access definition; 

traversing the document object model tree to obtain 
information to retrieve relational data; and 



mapping the relational data to one or more XML docu- 
ments. 

147. The article of manufacture of claim 146, wherein the 
document access definition defines a mapping between the 
relational data and one or more XML documents. 

148. The article of manufacture of claim 146, wherein the 
document object model tree comprises one or more rela- 
tional database nodes. 

149. The article of manufacture of claim 148, wherein a 
relational database node comprises an attribute node that 
maps to a column of a relational database table. 

150. The article of manufacture of claim 148, wherein a 
relational database node comprises an element node that 
maps to a column of a relational database table. 

151. The article of manufacture of claim 148, wherein a 
relational database node comprises a text node that maps to 
a column of a relational database table. 

152. The article of manufacture of claim 148, wherein the 
relational database node identifies a relational table into 
which XML document data is to be stored. 

153. The article of manufacture of claim 148, wherein the 
relational database node identifies a column in a relational 
table that contains XML document data to be retrieved. 

154. The article of manufacture of claim 148, wherein the 
relational database node identifies one or more predicates 
used to select cohuntn data from a relational table to store 
into one or more XML documents. 

155. The article of manufacture of claim 148, wherein the 
relational database node identifies a join relationship for 
joining multiple tables. 

156. The article of manufacture of claim 155, wherein the 
relational database node identifies a primary and foreign key 
relationship for the join relationship. 

157. The article of manufacture of claim 146, further 
comprising generating queries to obtain relational data using 
the document object model tree. 

158. The article of manufacture of claim 146, wherein the 
relational data comprises an attribute value to be written to 
an XML document 

159. The article of manufacture of claim 146, wherein the 
relational data comprises element text to be written to an 
XML document. 

160. The article of manufacture of claim 146, further 
comprising a stored procedure that receives the document 
access definition and outputs a table populated with the one 
or more XML documents. 

161. A method of transforming data stored on a data store 
that is connected to a computer, comprising: 

receiving an XML document containing XML data; 

receiving a document access definition that identifies one 
or more relational tables and columns; and 

mapping the XML data to the relational tables and col- 
umns using the document access definition. 

162. The method of claim 161, further comprising gen- 
erating a first document object model tree using the XML 
document. 

163. The method of claim 162, wherein the first document 
object model tree is generated by parsing the XML docu- 
ment. 

164. The method of claim 161, further comprising gen- 
erating a second document object model tree using the 
document access definition. 
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165. The method of claim 164, wherein the second 
document object model tree is generated by parsing the 
document access definition. 

166. The method of claim 161, wherein mapping further 
comprises: 

generating a first document object model tree using data 
from the XML document; 

generating a second document object model tree using a 
document access definition; and 

mapping the data from the first document object model 
tree into columns in one or more relational tables using 
the second document object model tree. 

167. The method of claim 161, wherein the XML data is 
stored untagged into the relational tables. 

168. The method of claim 161, wherein the relational 
tables are new tables. 

169. The method of claim 161, wherein the relational 
tables are existing tables. 

170. The method of claim 161, wherein the mapping is 
performed by a stored procedure. 

171. An apparatus for transforming data, comprising: 

a computer having a data store coupled thereto, wherein 
the data store stores data; and 

one or more computer programs, performed by the com- 
puter, for receiving an XML document containing 
XML data, receiving a document access definition that 
identifies one or more relational tables and columns, 
and mapping the XML data to the relational tables and 
columns using the document access definition. 

172. The apparatus of claim 171, further comprising 
generating a first document object model tree using the XML 
document. 

173. The apparatus of claim 172, wherein the first docu- 
ment object model tree is generated by parsing the XML 
document. 

174. The apparatus of claim 171, further comprising 
generating a second document object model tree using the 
document access definition. 

175. The apparatus of claim 174, wherein the second 
document object model tree is generated by parsing the 
document access definition. 

176. The apparatus of claim 171, wherein mapping further 
comprises: 

generating a first document object model tree using data 
from the XML document; 

generating a second document object model tree using a 
document access definition; and 

mapping the data from the first document object model 
tree into columns in one or more relational tables using 
the second document object model tree. 



177. The apparatus of claim 171, wherein the XML data 
is stored untagged into the relational tables. 

178. Toe apparatus of claim 171, wherein the relational 
tables are new tables. 

179. The apparatus of claim 171, wherein the relational 
tables are existing tables. 

180. The apparatus of claim 171, wherein the mapping is 
performed by a stored procedure. 

181. An article of manufacture comprising a program 
storage medium readable by a computer and embodying one 
or more instructions executable by the computer to perform 
steps for transforming data in a data store connected to the 
computer, comprising: 

receiving XML document containing XML data; 

receiving a document access definition that identifies one 
or more relational tables and columns; and 

mapping the XML data to the relational tables and col- 
umns using the document access definition. 

182. The article of manufacture of claim 181, further 
comprising generating a first document object model tree 
using the XML document. 

183. The article of manufacture of claim 182, wherein the 
first document object model tree is generated by parsing the 
XML document. 

184. The article of manufacture of claim 181, further 
comprising generating a second document object model tree 
using the document access definition. 

185. The article of manufacture of claim 184, wherein the 
second document object model tree is generated by parsing 
the document access definition. 

186. The article of manufacture of claim 181, wherein 
mapping further comprises: 

generating a first document object model tree using data 
from the XML document; 

generating a second document object model tree using a 
document access definition; and 

mapping the data from the first document object model 
tree into columns in one or more relational tables using 
the second document object model tree. 

187. The article of manufacture of claim 181, wherein the 
XML data is stored untagged into the relational tables. 

188. The article of manufacture of claim 181, wherein the 
relational tables are new tables. 

189. The article of manufacture of claim 181, wherein the 
relational tables are existing tables. 

190. The article of manufacture of claim 181, wherein the 
mapping is performed by a stored procedure. 
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